AI Crawler Policy Scanner

See which AI bots can access your content. Check robots.txt, RSL licences, and content signals for any domain.

Identify AI training crawlers
Detect unblocked bots
Fast, free, open source

Enter a URL to check its AI crawler policies

AI Crawler Reference - What each bot does

Training Crawlers (16)

GPTBot - OpenAI (Model training)
ClaudeBot - Anthropic (Model training)
anthropic-ai - Anthropic (Bulk model training)
Claude-Web - Anthropic (Web-focused training)
Google-Extended - Google (Gemini training)
GoogleOther - Google (Research & development)
Meta-ExternalAgent - Meta (AI model training)
FacebookBot - Meta (Speech recognition training)
Applebot-Extended - Apple (Generative AI training)
Amazonbot - Amazon (AI improvement, model training)
CCBot - Common Crawl (Open dataset collection)
Bytespider - ByteDance (AI training)
cohere-ai - Cohere (LLM training)
Diffbot - Diffbot (AI data extraction)
Omgilibot - Webz.io (Data collection for resale)
ImagesiftBot - The Hive (Image model training)

Search Crawlers (4)

OAI-SearchBot - OpenAI (ChatGPT search indexing)
PerplexityBot - Perplexity (Search indexing)
YouBot - You.com (AI search)
DuckAssistBot - DuckDuckGo (AI-assisted answers)

Other (6)

ChatGPT-User - OpenAI (User-requested fetching)
Perplexity-User - Perplexity (User-requested fetching)
Meta-ExternalFetcher - Meta (Real-time content fetching)
Applebot - Apple (Siri, Spotlight, Safari)
Google-CloudVertexBot - Google (Cloud AI services)
Amzn-SearchBot - Amazon (Alexa and Rufus search)