|

Cloudflare

Cloudflare already classifies AI traffic at the edge - crawlers vs assistants vs search, verified vs suspected. OA normalises those signals into a standard content_retrieved event. A content owner using Cloudflare, licensing through Microsoft's Content Marketplace, and getting crawled directly by ChatGPT sees all three in the same format. One standard across CDNs, content marketplaces, and agent-side reporting.

Edge layer Zero latency impact source_role: edge

Already on Cloudflare? Deploy in a few commands

You'll need an OpenAttribution API key (oat_pub_...) - create one when you register your domain. New here? Start with the quickstart.

bash
git clone https://github.com/openattribution-org/cloudflare-worker
cd cloudflare-worker && npm install
cp wrangler.example.toml wrangler.toml

# Edit wrangler.toml - set routes + zone id - then:
npx wrangler secret put OA_API_KEY            # paste your oat_pub_ key
npx wrangler deploy

Full walkthrough below - detection tiers, the API-token deployment, access gating, and the Zaraz alternative.


What Cloudflare provides

Any site proxied through Cloudflare already has AI bot visibility. The depth of signal depends on the plan.

PlanWhat you get
FreeBot Fight Mode (JS detection), "Block AI Scrapers and Crawlers" toggle, Security Events dashboard showing flagged bot traffic
Pro / BusinessSuper Bot Fight Mode with bot traffic groupings (automated, likely automated, likely human, verified bot) in Security Analytics
EnterpriseFull Bot Management: BotScore (1-99), VerifiedBotCategory, JA4 (TLS fingerprint), JA4Signals, BotTags. Available in Workers via request.cf.botManagement

Cloudflare also exposes this data programmatically via the GraphQL Analytics API at api.cloudflare.com/client/v4/graphql. Datasets like httpRequestsAdaptiveGroups include bot classification dimensions - useful for understanding volume before enabling telemetry, or for building custom reporting alongside it.


What OA adds

Cloudflare's signals are rich but Cloudflare-specific. A content owner typically has multiple sources of AI interaction data - edge traffic through their CDN, licensing deals through content marketplaces like Microsoft or Amazon, and direct crawls from agents like ChatGPT or Perplexity. Without a standard format, each source reports differently and none of them correlate.

OA normalises all of these into a single event structure:

  • A Cloudflare edge event, a Microsoft Content Marketplace citation event, and an agent-reported retrieval all produce the same content_retrieved payload
  • Events correlate across sources via Content-Telemetry-ID - the edge saw the fetch, the marketplace reported the citation, and they link up
  • The content owner's .well-known/openattribution.json manifest tells agents, CDNs, and marketplaces where to send telemetry
  • Measurement partners and dashboards consume one format regardless of source

How it works

flow
AI Bot ──GET──> Cloudflare Edge (Worker)
                    │
                    ├── Classifies request via Bot Management
                    │   (BotScore, VerifiedBotCategory, JA4, ASN)
                    │
                    ├── Reads Content-Telemetry-ID header (if present)
                    ├── Passes request to origin (no delay)
                    └── ctx.waitUntil(fetch(...))
                        └── Async POST to OA telemetry endpoint

ctx.waitUntil() fires the telemetry POST after the response has been sent to the client. The telemetry never blocks the page load.


Event payload

The Worker maps Cloudflare's signals into a standard OA event. This is the same structure produced by every OA integration - other CDNs, content marketplaces, origin middleware, and agent-side reporting.

target payload
POST /events
X-API-Key: oat_pub_yourkey...
Content-Type: application/json

{
  "events": [{
    "id": "8f3c1d2e-4b5a-6c7d-8e9f-0a1b2c3d4e5f",
    "type": "content_retrieved",
    "timestamp": "2026-03-28T14:30:00Z",
    "content_url": "https://example.com/article/ai-transparency",
    "source_role": "edge",
    "content_telemetry_id": "from Content-Telemetry-ID request header, if present",
    "data": {
      "user_agent": "ChatGPT-User/1.0",
      "bot_category": "inference",
      "verified": true,
      "asn": 14061,
      "asn_org": "DigitalOcean, LLC",
      "country": "US",
      "ja4": "t13d..."
    }
  }]
}

Bot classification

Cloudflare classifies AI traffic into three categories via VerifiedBotCategory. The Worker maps these to OA's standard bot_category values:

Cloudflare categoryOA bot_categoryMeaning
AI CrawlertrainingCrawling for model training data (GPTBot, ClaudeBot, etc.)
AI AssistantinferenceUser-triggered fetching at query time (ChatGPT-User, Perplexity-User)
AI SearchsearchAI-powered search indexing (OAI-SearchBot)

The inference category is where content attribution is most relevant - there is a user, a query, and a session behind the retrieval. training crawls have no session context but are valuable to track for volume and compliance visibility.


Verified agent identity (Web Bot Auth)

A user-agent string is a claim, not a credential - anything can put ChatGPT-User in a header. Web Bot Auth replaces the claim with a signature. It is an open, IETF-track scheme built on HTTP Message Signatures (RFC 9421, Ed25519 keys) that lets an agent cryptographically sign each request, so the edge can verify which operator actually sent it rather than trusting the header.

Three request headers carry it:

  • Signature - the signature value
  • Signature-Input - which request components were signed, plus keyid, created, expires, tag
  • Signature-Agent - the URL of the operator's public-key directory, served as a JWKS at /.well-known/http-message-signatures-directory

Google is testing this for Google-Agent, signing under the identity agent.bot.goog. Cloudflare, Akamai and AWS are implementing verification on the CDN side. The two governing drafts are the architecture draft and the key-directory draft.

Why this matters for telemetry

It changes what verified: true on a content_retrieved event means. Today that field reflects Cloudflare's reputation-based bot verification. With Web Bot Auth, a verified edge event is backed by a cryptographic attestation that the request came from a named operator - attribution keyed to an identity that cannot be spoofed by copying a user-agent string.

Cloudflare surfaces a passing signature as a verified bot (request.cf.botManagement.verifiedBot), and the raw Signature-Agent header is readable in the Worker on any plan. Capture it into the event so attribution records the signing operator alongside the user agent:

worker.js (capturing signed identity)
const sigAgent = request.headers.get('Signature-Agent'); // e.g. "https://agent.bot.goog"

const event = {
  // ...id, type, timestamp, content_url, source_role: 'edge'...
  data: {
    user_agent: request.headers.get('user-agent'),
    bot_category: botCategory,
    verified: bm.verifiedBot || false,
    signature_agent: sigAgent || undefined,   // cryptographically attested operator
    asn: cf?.asn,
    asn_org: cf?.asOrganization,
    country: cf?.country,
    ja4: bm.ja4,
  },
};
Emerging, not universal
Web Bot Auth is early - Google's signing is experimental and most agents do not sign yet. Treat signature_agent as a high-confidence enrichment when present, and keep user-agent and bot-classification detection as the baseline. As signing spreads, the signed identity becomes the primary key for verified retrieval.

Integration paths

Deploy via Wrangler (open source)

Clone the open source Worker and deploy to your account:

bash
git clone https://github.com/openattribution-org/cloudflare-worker
cd cloudflare-worker
npm install
cp wrangler.example.toml wrangler.toml

# Edit wrangler.toml - set routes + zone id - then:
npx wrangler secret put OA_API_KEY     # paste your oat_pub_ key (not stored in config)

npx wrangler deploy

API token deployment

Grant OA a scoped Cloudflare API token and we deploy and maintain the Worker for you. The token only needs Workers Scripts:Edit permission on your zone.

Native integration

The ideal path: Cloudflare's bot classification triggers an async event to the content owner's OA telemetry endpoint without a separate Worker. The content owner enables it, the .well-known/openattribution.json manifest tells the integration where to send events, and retrieval data flows in the standard format. No Worker deployment, no Wrangler, no API token.

Zaraz alternative
For sites already using Cloudflare Zaraz for analytics, OpenAttribution can be added as a custom managed component. Lighter weight than a full Worker.

The Worker

The Worker uses Cloudflare's bot classification rather than maintaining a hardcoded bot list. On Enterprise plans, request.cf.botManagement provides the full classification. On Free and Pro plans, Cloudflare's managed "AI Scrapers and Crawlers" rules handle detection upstream - the Worker checks whether the request was flagged.

worker.js (Enterprise - Bot Management)
export default {
  async fetch(request, env, ctx) {
    // Pass request to origin immediately
    const response = await fetch(request);

    const cf = request.cf;
    const bm = cf?.botManagement || {};
    const isAiBot = bm.score < 30 || bm.verifiedBot;

    if (isAiBot) {
      // Map Cloudflare's category to OA bot_category
      const categoryMap = {
        'AI Crawler': 'training',
        'AI Assistant': 'inference',
        'AI Search': 'search',
      };
      const botCategory = categoryMap[cf?.verifiedBotCategory] || 'training';

      const event = {
        id: crypto.randomUUID(),
        type: 'content_retrieved',
        timestamp: new Date().toISOString(),
        content_url: request.url,
        source_role: 'edge',
        content_telemetry_id: request.headers.get('Content-Telemetry-ID') || undefined,
        data: {
          user_agent: request.headers.get('user-agent'),
          bot_category: botCategory,
          verified: bm.verifiedBot || false,
          asn: cf?.asn,
          asn_org: cf?.asOrganization,
          country: cf?.country,
          ja4: bm.ja4,
        },
      };

      // Fire and forget - does not block response
      ctx.waitUntil(
        fetch(env.OA_TELEMETRY_ENDPOINT, {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'X-API-Key': env.OA_API_KEY,
          },
          body: JSON.stringify({ events: [event] }),
        })
      );
    }

    return response;
  },
};

Enrichment signals

These Cloudflare edge signals are available to the Worker and map into OA event fields:

Cloudflare signalOA fieldPlan
request.headers['user-agent']user_agentAll
request.headers['Content-Telemetry-ID']content_telemetry_idAll
cf.asnasnAll
cf.asOrganizationasn_orgAll
cf.countrycountryAll
botManagement.verifiedBotverifiedEnterprise
botManagement.scorebot_scoreEnterprise
botManagement.ja4ja4Enterprise

Access gating

Separate from telemetry, the Worker (or a WAF rule on Enterprise) can conditionally block AI agents that don't include the Content-Telemetry-ID header:

WAF expression (Enterprise)
(cf.bot_management.verified_bot) and (not http.request.headers["Content-Telemetry-ID"])

Agents that participate in the protocol get access. Agents that don't get a 403 with a response explaining how to participate.

Complementary to existing products
This pattern sits alongside Cloudflare's AI crawl control and monetisation tools, and alongside content marketplace licensing deals. Cloudflare handles access decisions at the edge. Marketplaces handle licensing and payment. OA handles attribution - what happened after access was granted, across all of those channels. A content owner uses all three.

Configuration

VariableDescriptionDefault
OA_API_KEYYour OpenAttribution API key (set via wrangler secret put)Required (secret)
OA_TELEMETRY_ENDPOINTEdge events endpoint URLhttps://telemetry.openattribution.org/events

Content owners not currently on Cloudflare can proxy traffic through a free Cloudflare account. The proxy adds sub-50ms of latency and immediately unlocks bot classification and Security Analytics. The OA Worker can then be deployed on top.

Existing precedents
This pattern is proven in production by Castle (bot detection), DataDome (bot protection), and Honeycomb (observability) - all deploy Workers via API token with async outbound telemetry.