|

Cloudflare

Cloudflare already classifies AI traffic at the edge - crawlers vs assistants vs search, verified vs suspected. OA normalises those signals into a standard content_retrieved event. A publisher using Cloudflare, licensing through Microsoft's Content Marketplace, and getting crawled directly by ChatGPT sees all three in the same format. One standard across CDNs, content marketplaces, and agent-side reporting.

Edge layer Zero latency impact source_role: edge

What Cloudflare provides

Any site proxied through Cloudflare already has AI bot visibility. The depth of signal depends on the plan.

PlanWhat you get
FreeBot Fight Mode (JS detection), "Block AI Scrapers and Crawlers" toggle, Security Events dashboard showing flagged bot traffic
Pro / BusinessSuper Bot Fight Mode with bot traffic groupings (automated, likely automated, likely human, verified bot) in Security Analytics
EnterpriseFull Bot Management: BotScore (1-99), VerifiedBotCategory, JA4 (TLS fingerprint), JA4Signals, BotTags. Available in Workers via request.cf.botManagement

Cloudflare also exposes this data programmatically via the GraphQL Analytics API at api.cloudflare.com/client/v4/graphql. Datasets like httpRequestsAdaptiveGroups include bot classification dimensions - useful for understanding volume before enabling telemetry, or for building custom reporting alongside it.


What OA adds

Cloudflare's signals are rich but Cloudflare-specific. A publisher typically has multiple sources of AI interaction data - edge traffic through their CDN, licensing deals through content marketplaces like Microsoft or Amazon, and direct crawls from agents like ChatGPT or Perplexity. Without a standard format, each source reports differently and none of them correlate.

OA normalises all of these into a single event structure:

  • A Cloudflare edge event, a Microsoft Content Marketplace citation event, and an agent-reported retrieval all produce the same content_retrieved payload
  • Events correlate across sources via OA-Telemetry-ID - the edge saw the fetch, the marketplace reported the citation, and they link up
  • The content owner's .well-known/openattribution file tells agents, CDNs, and marketplaces where to send telemetry
  • Measurement partners and dashboards consume one format regardless of source

How it works

flow
AI Bot ──GET──> Cloudflare Edge (Worker)
                    │
                    ├── Classifies request via Bot Management
                    │   (BotScore, VerifiedBotCategory, JA4, ASN)
                    │
                    ├── Reads OA-Telemetry-ID header (if present)
                    ├── Passes request to origin (no delay)
                    └── ctx.waitUntil(fetch(...))
                        └── Async POST to OA telemetry endpoint

ctx.waitUntil() fires the telemetry POST after the response has been sent to the client. The telemetry never blocks the page load.


Event payload

The Worker maps Cloudflare's signals into a standard OA event. This is the same structure produced by every OA integration - other CDNs, content marketplaces, origin middleware, and agent-side reporting.

target payload
POST /api/v1/telemetry/edge/events
X-API-Key: oa_yourkey...
Content-Type: application/json

{
  "events": [{
    "type": "content_retrieved",
    "timestamp": "2026-03-28T14:30:00Z",
    "content_url": "https://example.com/article/ai-transparency",
    "source_role": "edge",
    "oa_telemetry_id": "from OA-Telemetry-ID request header, if present",
    "data": {
      "user_agent": "ChatGPT-User/1.0",
      "bot_category": "inference",
      "verified": true,
      "asn": 14061,
      "asn_org": "DigitalOcean, LLC",
      "country": "US",
      "ja4": "t13d..."
    }
  }]
}

Bot classification

Cloudflare classifies AI traffic into three categories via VerifiedBotCategory. The Worker maps these to OA's standard bot_category values:

Cloudflare categoryOA bot_categoryMeaning
AI CrawlertrainingCrawling for model training data (GPTBot, ClaudeBot, etc.)
AI AssistantinferenceUser-triggered fetching at query time (ChatGPT-User, Perplexity-User)
AI SearchsearchAI-powered search indexing (OAI-SearchBot)

The inference category is where content attribution is most relevant - there is a user, a query, and a session behind the retrieval. training crawls have no session context but are valuable to track for volume and compliance visibility.


Integration paths

Deploy via Wrangler (open source)

Clone the open source Worker and deploy to your account:

bash
git clone https://github.com/openattribution-org/cloudflare-worker
cd cloudflare-worker
cp wrangler.example.toml wrangler.toml
npm install

# Set your API key as a secret (not in config)
npx wrangler secret put OA_API_KEY

npx wrangler deploy

API token deployment

Grant OA a scoped Cloudflare API token and we deploy and maintain the Worker for you. The token only needs Workers Scripts:Edit permission on your zone.

Native integration

The ideal path: Cloudflare's bot classification triggers an async event to the content owner's OA telemetry endpoint without a separate Worker. The publisher enables it, the .well-known/openattribution file tells the integration where to send events, and retrieval data flows in the standard format. No Worker deployment, no Wrangler, no API token.

Zaraz alternative
For sites already using Cloudflare Zaraz for analytics, OpenAttribution can be added as a custom managed component. Lighter weight than a full Worker.

The Worker

The Worker uses Cloudflare's bot classification rather than maintaining a hardcoded bot list. On Enterprise plans, request.cf.botManagement provides the full classification. On Free and Pro plans, Cloudflare's managed "AI Scrapers and Crawlers" rules handle detection upstream - the Worker checks whether the request was flagged.

worker.js (Enterprise - Bot Management)
export default {
  async fetch(request, env, ctx) {
    // Pass request to origin immediately
    const response = await fetch(request);

    const cf = request.cf;
    const bm = cf?.botManagement || {};
    const isAiBot = bm.score < 30 || bm.verifiedBot;

    if (isAiBot) {
      // Map Cloudflare's category to OA bot_category
      const categoryMap = {
        'AI Crawler': 'training',
        'AI Assistant': 'inference',
        'AI Search': 'search',
      };
      const botCategory = categoryMap[cf?.verifiedBotCategory] || 'training';

      const event = {
        type: 'content_retrieved',
        timestamp: new Date().toISOString(),
        content_url: request.url,
        source_role: 'edge',
        oa_telemetry_id: request.headers.get('OA-Telemetry-ID') || undefined,
        data: {
          user_agent: request.headers.get('user-agent'),
          bot_category: botCategory,
          verified: bm.verifiedBot || false,
          asn: cf?.asn,
          asn_org: cf?.asOrganization,
          country: cf?.country,
          ja4: bm.ja4,
        },
      };

      // Fire and forget - does not block response
      ctx.waitUntil(
        fetch(env.OA_TELEMETRY_ENDPOINT, {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'X-API-Key': env.OA_API_KEY,
          },
          body: JSON.stringify({ events: [event] }),
        })
      );
    }

    return response;
  },
};

Enrichment signals

These Cloudflare edge signals are available to the Worker and map into OA event fields:

Cloudflare signalOA fieldPlan
request.headers['user-agent']user_agentAll
request.headers['OA-Telemetry-ID']oa_telemetry_idAll
cf.asnasnAll
cf.asOrganizationasn_orgAll
cf.countrycountryAll
botManagement.verifiedBotverifiedEnterprise
botManagement.scorebot_scoreEnterprise
botManagement.ja4ja4Enterprise

Access gating

Separate from telemetry, the Worker (or a WAF rule on Enterprise) can conditionally block AI agents that don't include the OA-Telemetry-ID header:

WAF expression (Enterprise)
(cf.bot_management.verified_bot) and (not http.request.headers["OA-Telemetry-ID"])

Agents that participate in the protocol get access. Agents that don't get a 403 with a response explaining how to participate.

Complementary to existing products
This pattern sits alongside Cloudflare's AI crawl control and monetisation tools, and alongside content marketplace licensing deals. Cloudflare handles access decisions at the edge. Marketplaces handle licensing and payment. OA handles attribution - what happened after access was granted, across all of those channels. A publisher uses all three.

Configuration

VariableDescriptionDefault
OA_API_KEYYour OpenAttribution API key (set via wrangler secret put)Required (secret)
OA_TELEMETRY_ENDPOINTEdge events endpoint URLhttps://api.openattribution.org/api/v1/telemetry/edge/events

Publishers not currently on Cloudflare can proxy traffic through a free Cloudflare account. The proxy adds sub-50ms of latency and immediately unlocks bot classification and Security Analytics. The OA Worker can then be deployed on top.

Existing precedents
This pattern is proven in production by Castle (bot detection), DataDome (bot protection), and Honeycomb (observability) - all deploy Workers via API token with async outbound telemetry.