Cloudflare
Cloudflare already classifies AI traffic at the edge - crawlers vs assistants vs search,
verified vs suspected. OA normalises those signals into a standard content_retrieved event. A publisher using Cloudflare, licensing through Microsoft's Content Marketplace, and
getting crawled directly by ChatGPT sees all three in the same format. One standard across
CDNs, content marketplaces, and agent-side reporting.
What Cloudflare provides
Any site proxied through Cloudflare already has AI bot visibility. The depth of signal depends on the plan.
| Plan | What you get |
|---|---|
| Free | Bot Fight Mode (JS detection), "Block AI Scrapers and Crawlers" toggle, Security Events dashboard showing flagged bot traffic |
| Pro / Business | Super Bot Fight Mode with bot traffic groupings (automated, likely automated, likely human, verified bot) in Security Analytics |
| Enterprise | Full Bot Management: BotScore (1-99), VerifiedBotCategory, JA4 (TLS fingerprint), JA4Signals, BotTags.
Available in Workers via request.cf.botManagement |
Cloudflare also exposes this data programmatically via the GraphQL Analytics API at api.cloudflare.com/client/v4/graphql.
Datasets like httpRequestsAdaptiveGroups include bot classification dimensions - useful for understanding volume before enabling telemetry,
or for building custom reporting alongside it.
What OA adds
Cloudflare's signals are rich but Cloudflare-specific. A publisher typically has multiple sources of AI interaction data - edge traffic through their CDN, licensing deals through content marketplaces like Microsoft or Amazon, and direct crawls from agents like ChatGPT or Perplexity. Without a standard format, each source reports differently and none of them correlate.
OA normalises all of these into a single event structure:
- A Cloudflare edge event, a Microsoft Content Marketplace citation event, and an agent-reported retrieval all produce the same
content_retrievedpayload - Events correlate across sources via
OA-Telemetry-ID- the edge saw the fetch, the marketplace reported the citation, and they link up - The content owner's
.well-known/openattributionfile tells agents, CDNs, and marketplaces where to send telemetry - Measurement partners and dashboards consume one format regardless of source
How it works
AI Bot ──GET──> Cloudflare Edge (Worker)
│
├── Classifies request via Bot Management
│ (BotScore, VerifiedBotCategory, JA4, ASN)
│
├── Reads OA-Telemetry-ID header (if present)
├── Passes request to origin (no delay)
└── ctx.waitUntil(fetch(...))
└── Async POST to OA telemetry endpointctx.waitUntil() fires the telemetry POST after the response has been sent to the client. The
telemetry never blocks the page load.
Event payload
The Worker maps Cloudflare's signals into a standard OA event. This is the same structure produced by every OA integration - other CDNs, content marketplaces, origin middleware, and agent-side reporting.
POST /api/v1/telemetry/edge/events
X-API-Key: oa_yourkey...
Content-Type: application/json
{
"events": [{
"type": "content_retrieved",
"timestamp": "2026-03-28T14:30:00Z",
"content_url": "https://example.com/article/ai-transparency",
"source_role": "edge",
"oa_telemetry_id": "from OA-Telemetry-ID request header, if present",
"data": {
"user_agent": "ChatGPT-User/1.0",
"bot_category": "inference",
"verified": true,
"asn": 14061,
"asn_org": "DigitalOcean, LLC",
"country": "US",
"ja4": "t13d..."
}
}]
}Bot classification
Cloudflare classifies AI traffic into three categories via VerifiedBotCategory.
The Worker maps these to OA's standard bot_category values:
| Cloudflare category | OA bot_category | Meaning |
|---|---|---|
AI Crawler | training | Crawling for model training data (GPTBot, ClaudeBot, etc.) |
AI Assistant | inference | User-triggered fetching at query time (ChatGPT-User, Perplexity-User) |
AI Search | search | AI-powered search indexing (OAI-SearchBot) |
The inference category is where content attribution is most relevant - there is a user, a query, and a session
behind the retrieval. training crawls have no session context but are valuable to track for volume and compliance visibility.
Integration paths
Deploy via Wrangler (open source)
Clone the open source Worker and deploy to your account:
git clone https://github.com/openattribution-org/cloudflare-worker
cd cloudflare-worker
cp wrangler.example.toml wrangler.toml
npm install
# Set your API key as a secret (not in config)
npx wrangler secret put OA_API_KEY
npx wrangler deployAPI token deployment
Grant OA a scoped Cloudflare API token and we deploy and maintain the Worker for you. The token
only needs Workers Scripts:Edit permission on your zone.
Native integration
The ideal path: Cloudflare's bot classification triggers an async event to the content owner's
OA telemetry endpoint without a separate Worker. The publisher enables it, the .well-known/openattribution file tells the integration where to send events, and retrieval data flows in the standard
format. No Worker deployment, no Wrangler, no API token.
The Worker
The Worker uses Cloudflare's bot classification rather than maintaining a hardcoded bot list.
On Enterprise plans, request.cf.botManagement provides the full classification. On Free and Pro plans, Cloudflare's managed "AI Scrapers and
Crawlers" rules handle detection upstream - the Worker checks whether the request was flagged.
export default {
async fetch(request, env, ctx) {
// Pass request to origin immediately
const response = await fetch(request);
const cf = request.cf;
const bm = cf?.botManagement || {};
const isAiBot = bm.score < 30 || bm.verifiedBot;
if (isAiBot) {
// Map Cloudflare's category to OA bot_category
const categoryMap = {
'AI Crawler': 'training',
'AI Assistant': 'inference',
'AI Search': 'search',
};
const botCategory = categoryMap[cf?.verifiedBotCategory] || 'training';
const event = {
type: 'content_retrieved',
timestamp: new Date().toISOString(),
content_url: request.url,
source_role: 'edge',
oa_telemetry_id: request.headers.get('OA-Telemetry-ID') || undefined,
data: {
user_agent: request.headers.get('user-agent'),
bot_category: botCategory,
verified: bm.verifiedBot || false,
asn: cf?.asn,
asn_org: cf?.asOrganization,
country: cf?.country,
ja4: bm.ja4,
},
};
// Fire and forget - does not block response
ctx.waitUntil(
fetch(env.OA_TELEMETRY_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-Key': env.OA_API_KEY,
},
body: JSON.stringify({ events: [event] }),
})
);
}
return response;
},
};Enrichment signals
These Cloudflare edge signals are available to the Worker and map into OA event fields:
| Cloudflare signal | OA field | Plan |
|---|---|---|
request.headers['user-agent'] | user_agent | All |
request.headers['OA-Telemetry-ID'] | oa_telemetry_id | All |
cf.asn | asn | All |
cf.asOrganization | asn_org | All |
cf.country | country | All |
botManagement.verifiedBot | verified | Enterprise |
botManagement.score | bot_score | Enterprise |
botManagement.ja4 | ja4 | Enterprise |
Access gating
Separate from telemetry, the Worker (or a WAF rule on Enterprise) can conditionally block AI
agents that don't include the OA-Telemetry-ID header:
(cf.bot_management.verified_bot) and (not http.request.headers["OA-Telemetry-ID"])Agents that participate in the protocol get access. Agents that don't get a 403 with a response explaining how to participate.
Configuration
| Variable | Description | Default |
|---|---|---|
OA_API_KEY | Your OpenAttribution API key (set via wrangler secret put) | Required (secret) |
OA_TELEMETRY_ENDPOINT | Edge events endpoint URL | https://api.openattribution.org/api/v1/telemetry/edge/events |
Publishers not currently on Cloudflare can proxy traffic through a free Cloudflare account. The proxy adds sub-50ms of latency and immediately unlocks bot classification and Security Analytics. The OA Worker can then be deployed on top.