Back to research

Related research

Independent academic and industry work that informs or extends the OpenAttribution research programme. Each entry is summarised with what it measured, what it found, and where it fits with the open standards work.

Washington University in St. Louis arXiv:2605.14021

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

Haofei Xu, Umar Iqbal, Jacob M. Montgomery

Forty-day longitudinal audit of Google AI Overviews. 55,393 trending queries across 19 topical categories produced 7,583 AIOs, which the authors then decomposed into 98,020 atomic factual claims and verified against the full text of every cited reference page. The first large-scale measurement study to simultaneously characterise AIO activation, source selection, claim fidelity, and publisher economic exposure on a naturalistic, sustained query sample.

Key findings

  • Overall AIO activation is 13.7%, but rises to 64.7% for question-form queries and falls as low as 7.5% for Politics and 9.6% for Law & Government - non-uniform suppression that is not publicly documented.
  • 29.8% of AIO-cited domains do not appear on the corresponding first-page SERP at all (28.5% at the URL level). The retrieval pool is a separate selection mechanism from Google’s own ranking algorithm.
  • AIO-cited domains are systematically more credible than co-displayed first-page results (+0.087 on the PC1 credibility scale), contradicting prior work that found AIOs draw on lower-quality sources.
  • 11.0% of atomic claims are unsupported by their cited references, dominated by silent omission (7.0%) rather than active contradiction (2.7%) - a failure mode no reader of the AIO can detect.
  • 50.6% of AIO-cited publisher pages carry display advertising, and Google’s own sponsored search ads continue to appear on the same SERP, in some cases above the AIO. The deployment is asymmetric: it suppresses publisher click-throughs while preserving Google’s ad capture.
Read the source
EBU / BBC EBU MIS report

News Integrity in AI Assistants

European Broadcasting Union and BBC, coordinated across 22 Public Service Media organisations

The largest coordinated audit of AI-assistant news accuracy to date. 22 Public Service Media organisations across 18 countries and 14 languages evaluated 2,709 responses from the free versions of ChatGPT, Microsoft Copilot, Perplexity, and Google Gemini against editorial criteria covering accuracy, sourcing, distinguishing fact from opinion, and providing context.

Key findings

  • 45% of responses contained at least one significant issue, and 81% contained an issue of some kind. The result reproduced across languages and countries.
  • Sourcing was the single largest category of significant issue, present in 31% of responses overall - more than accuracy, context, or opinion-handling.
  • Significant-issue rates varied sharply by assistant: Gemini 76%, Copilot 37%, ChatGPT 36%, Perplexity 30%.
  • Significant sourcing issues specifically: Gemini 72%, ChatGPT 24%, Perplexity 15%, Copilot 15%. The per-assistant sourcing problem is not uniform across the market.
  • Sourcing failures counted include claims not supported by the cited source, no source provided, and incorrect sourcing claims - the same failure modes a grounded-vs-cited event correlation would expose deterministically.
Read the source
Tow Center for Digital Journalism, Columbia Journalism Review Columbia Journalism Review

AI Search Has A Citation Problem

Klaudia Jazwinska, Aisvarya Chandrasekar

The reference audit for the first generation of AI search products. 200 article excerpts were drawn from across 20 publishers and queried against 8 generative AI search tools - ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2, Grok-3, and Microsoft Copilot - for 1,600 total queries. The authors evaluated whether each tool could correctly identify the article, its original publisher, and a working URL.

Key findings

  • Across all tools, more than 60% of responses were incorrect - either the citation pointed to the wrong source, the URL was fabricated or broken, or the article was misidentified.
  • Paid premium tools were not more accurate. Perplexity Pro and Grok 3 - the most expensive products in the sample - returned higher rates of confidently wrong answers than their free counterparts, because they almost never declined to answer.
  • 5 of the 8 tools accessed and cited content from publishers whose robots.txt explicitly blocked the citing crawler. The opt-out signal was not enforced at the citation layer.
  • Tools rarely refused to answer or hedged when they did not have the source: 154 of 1,600 responses from ChatGPT Search were partially correct or correct with caveats; the rest were assigned full confidence. Confident misattribution was the norm, not an edge case.
  • Citations were frequently to syndicated copies on aggregator sites rather than to the original publisher - the kind of attribution drift that destroys publisher leverage even when a citation slot is technically filled.
Read the source

Suggest a paper

If you have published or come across independent measurement work on AI retrieval, citation, claim fidelity, publisher economics, or the regulatory surface around any of these, we would like to read it. Open standards depend on a body of evidence that travels beyond any one organisation.

Send it to hello@openattribution.org or open an issue at github.com/openattribution-org.