AWS CloudFront
CloudFront integration via Real-Time Logs piped through Kinesis to your OA telemetry endpoint.
Uses AWS WAF Bot Control's CategoryAI rule for classification. More infrastructure than other CDN options, but covers the largest cloud
market share.
The challenge
CloudFront is the weakest CDN for edge telemetry because of a fundamental limitation:
| Edge runtime | Outbound HTTP | Async (non-blocking) |
|---|---|---|
| CloudFront Functions | Not supported | N/A |
| Lambda@Edge | Supported | No - blocks response until complete |
Unlike Cloudflare Workers (ctx.waitUntil)
or Netlify Edge Functions (context.waitUntil),
Lambda@Edge has no fire-and-forget mechanism. Making an outbound HTTP call adds its latency directly
to the user's page load.
Recommended: Real-Time Logs pipeline
The best path is CloudFront Real-Time Logs streamed through Kinesis, with AWS WAF Bot Control handling AI bot classification:
AI Bot ──GET──> CloudFront + WAF Bot Control
│
├── Bot Control applies labels
│ awswaf:managed:aws:bot-control:bot:category:ai
│ awswaf:managed:aws:bot-control:bot:name:<name>
│ awswaf:managed:aws:bot-control:bot:verified
│
└── Real-Time Log entry (includes WAF labels)
└── Kinesis Data Stream
└── Kinesis Data Firehose
└── HTTPS delivery to OA endpointThis is fully async and adds zero latency to responses. The trade-off is infrastructure complexity - you need Kinesis Data Streams and Firehose configured in your AWS account.
CloudFormation template
Deploy the full pipeline with a single CloudFormation stack:
aws cloudformation deploy \
--template-file oa-cloudfront-telemetry.yaml \
--stack-name oa-telemetry \
--parameter-overrides \
OAOrgId=your-org-id \
CloudFrontDistributionId=E1234567890 \
--capabilities CAPABILITY_IAMThe template creates:
- Kinesis Data Stream for CloudFront Real-Time Logs
- Kinesis Data Firehose delivery stream to the OA telemetry endpoint
- IAM roles with minimal permissions
- CloudFront Real-Time Log configuration with WAF label filtering
Bot detection
AWS WAF Bot Control (AWSManagedRulesBotControlRuleSet)
has a dedicated CategoryAI rule that labels all AI bot traffic:
| Label | Meaning |
|---|---|
bot:category:ai | Request is from an AI bot |
bot:name:<name> | Specific bot identity (e.g. bot:name:gptbot) |
bot:verified | Bot identity cryptographically verified |
bot:web_bot_auth:verified | Verified via Web Bot Authentication (RFC 9421) |
signal:cloud_service_provider:<csp> | Origin infrastructure (aws, gcp, azure, oracle) |
Set the CategoryAI rule action to Count (instead of the default Block) so that AI traffic is
labelled but not blocked. The Kinesis pipeline then filters log entries by the bot:category:ai label.
CategoryAI blocks all AI bots by
default - including verified ones. Override the action to Count for telemetry-only use. Bots
verified via Web Bot Authentication are the one exception and pass through even on Block.Alternative: Lambda@Edge (with latency trade-off)
If the Kinesis pipeline is too heavy, a Lambda@Edge function works but adds latency to AI bot requests. WAF Bot Control labels are not directly accessible in Lambda@Edge - you need a WAF custom rule to forward them as a header:
{
"Name": "ForwardAIBotLabel",
"Priority": 10,
"Statement": {
"LabelMatchStatement": {
"Scope": "LABEL",
"Key": "awswaf:managed:aws:bot-control:bot:category:ai"
}
},
"Action": {
"Count": {
"CustomRequestHandling": {
"InsertHeaders": [
{ "Name": "x-oa-bot-category", "Value": "ai" }
]
}
}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "ForwardAIBotLabel"
}
}exports.handler = async (event) => {
const request = event.Records[0].cf.request;
// Check for WAF-forwarded AI bot header
const botCategory = request.headers['x-oa-bot-category']?.[0]?.value;
if (botCategory === 'ai') {
const ua = request.headers['user-agent']?.[0]?.value || '';
// This blocks until complete - adds ~50-200ms to bot requests
await postTelemetryEvent({
type: 'content_retrieved',
timestamp: new Date().toISOString(),
content_url: `https://${request.headers.host[0].value}${request.uri}`,
source_role: 'edge',
oa_telemetry_id: request.headers['oa-telemetry-id']?.[0]?.value || undefined,
data: { user_agent: ua },
});
}
return request;
};