AI Crawler Policy Scanner

See which AI bots can access your content. Check robots.txt, RSL licences, and content signals for any domain.

✓ Identify AI training crawlers

✓ Detect unblocked bots

✓ Fast, free, open source

Enter a URL to check its AI crawler policies

AI Crawler Reference - What each bot does

GPTBot - OpenAI (Model training)

ClaudeBot - Anthropic (Model training)

anthropic-ai - Anthropic (Bulk model training)

Claude-Web - Anthropic (Web-focused training)

Google-Extended - Google (Gemini training)

GoogleOther - Google (Research & development)

Meta-ExternalAgent - Meta (AI model training)

FacebookBot - Meta (Speech recognition training)

Applebot-Extended - Apple (Generative AI training)

Amazonbot - Amazon (AI improvement, model training)

CCBot - Common Crawl (Open dataset collection)

Bytespider - ByteDance (AI training)

cohere-ai - Cohere (LLM training)

Diffbot - Diffbot (AI data extraction)

Omgilibot - Webz.io (Data collection for resale)

ImagesiftBot - The Hive (Image model training)

OAI-SearchBot - OpenAI (ChatGPT search indexing)

PerplexityBot - Perplexity (Search indexing)

YouBot - You.com (AI search)

DuckAssistBot - DuckDuckGo (AI-assisted answers)

ChatGPT-User - OpenAI (User-requested fetching)

Perplexity-User - Perplexity (User-requested fetching)

Meta-ExternalFetcher - Meta (Real-time content fetching)

Applebot - Apple (Siri, Spotlight, Safari)

Google-CloudVertexBot - Google (Cloud AI services)

Amzn-SearchBot - Amazon (Alexa and Rufus search)