AI Crawler Policy Scanner
See which AI bots can access your content. Check robots.txt, RSL licences, and content signals for any domain.
✓ Identify AI training crawlers
✓ Detect unblocked bots
✓ Fast, free, open source
Enter a URL to check its AI crawler policies
AI Crawler Reference - What each bot does
Training Crawlers (16)
GPTBot - OpenAI (Model training)
ClaudeBot - Anthropic (Model training)
anthropic-ai - Anthropic (Bulk model training)
Claude-Web - Anthropic (Web-focused training)
Google-Extended - Google (Gemini training)
GoogleOther - Google (Research & development)
Meta-ExternalAgent - Meta (AI model training)
FacebookBot - Meta (Speech recognition training)
Applebot-Extended - Apple (Generative AI training)
Amazonbot - Amazon (AI improvement, model training)
CCBot - Common Crawl (Open dataset collection)
Bytespider - ByteDance (AI training)
cohere-ai - Cohere (LLM training)
Diffbot - Diffbot (AI data extraction)
Omgilibot - Webz.io (Data collection for resale)
ImagesiftBot - The Hive (Image model training)
Search Crawlers (4)
OAI-SearchBot - OpenAI (ChatGPT search indexing)
PerplexityBot - Perplexity (Search indexing)
YouBot - You.com (AI search)
DuckAssistBot - DuckDuckGo (AI-assisted answers)
Other (6)
ChatGPT-User - OpenAI (User-requested fetching)
Perplexity-User - Perplexity (User-requested fetching)
Meta-ExternalFetcher - Meta (Real-time content fetching)
Applebot - Apple (Siri, Spotlight, Safari)
Google-CloudVertexBot - Google (Cloud AI services)
Amzn-SearchBot - Amazon (Alexa and Rufus search)