AI assistants fail reliability test in major European news study

Research reveals nearly half of AI-generated news responses contain significant errors
Gemini and ChatGPT perform notably worse in sourcing and factual accuracy
Experts warn of risks to public trust and democratic processes from unreliable AI news tools

Artificial intelligence assistants from some of the world’s biggest tech firms are struggling to tell the truth about the news, according to a wide-ranging new study by the European Broadcasting Union (EBU) and the BBC.

The research, conducted with 22 public media organisations across 18 countries and in 14 languages, examined more than 3,000 AI-generated responses to news-related questions. It found that 45% of them contained at least one substantial problem — from factual mistakes to flawed or missing sourcing.

Across all tools tested — including OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Perplexity — more than 80% of responses showed some form of error. Around one in five answers contained outdated or entirely false information, while roughly a third exhibited serious sourcing failures.

Gemini performed worst, with between 72% and 76% of its responses showing sourcing errors — more than double the rate of its rivals. Examples of inaccuracies ranged from assistants misidentifying world leaders and fabricating legislative changes to producing fictitious quotes and statistics.

The study also highlighted a broader issue of transparency. In nearly a third of cases, AI assistants either omitted source citations or gave misleading attributions. The BBC said its journalism was at times distorted into “a confused cocktail” of errors, including fabricated quotes and altered facts.

The findings come amid rising use of AI tools as news gateways, particularly among younger audiences. According to the Reuters Institute, 7% of people worldwide — and 15% of those under 25 — already rely on AI chatbots for news. Regulators are beginning to respond: the Dutch data protection authority has warned against using AI assistants for voting advice during elections.

Researchers say the results point to a pressing need for AI companies to tighten accuracy controls, improve transparency and clarify how their systems handle editorial material. As these tools increasingly shape how people access information, the study’s authors warn that their flaws could deepen mistrust in both journalism and democratic institutions.

Source: Noah Wire Services

More on this

https://aif.ru/society/chast-iskusstvennyh-intellektualnyh-pomoshchnikov-iskazhayut-novostnoy-kontent – Please view link – unable to able to access data
https://www.reuters.com/business/media-telecom/ai-assistants-make-widespread-errors-about-news-new-research-shows-2025-10-21/ – A study by the European Broadcasting Union (EBU) and the BBC reveals that nearly half (45%) of responses from leading AI assistants contain significant errors when answering news-related questions. An analysis of 3,000 responses by AI tools—such as ChatGPT, Copilot, Gemini, and Perplexity—in 14 languages showed that 81% of responses had some form of issue. In particular, 33% involved serious sourcing errors, with Google’s Gemini showing the highest rate (72%) of sourcing problems. The study also found 20% of responses included outdated or inaccurate information; for instance, some assistants falsely reported legislative changes or misidentified the current Pope. With AI assistants increasingly replacing traditional search engines, the EBU warns this trend could erode public trust and democratic engagement. The report calls for AI developers to enhance accuracy, clarity on sources, and differentiation between factual reporting and opinion. The Reuters Institute notes that 7% of global online news consumers—and 15% of under-25s—now rely on AI for news. The findings emphasize the need for greater accountability from AI companies as their tools become influential news intermediaries.
https://www.vrtinternational.com/news/largest-study-of-its-kind-shows-ai-assistants-misrepresent-news-content-45 – A large-scale research study on the reliability of news via AI assistants, conducted by the European Broadcasting Union (EBU) and the BBC, with participation from VRT, found that AI assistants such as ChatGPT, Copilot, Gemini, or Perplexity provide incorrect or misleading news responses in nearly half of the cases. In total, 22 public broadcasters from 18 countries took part. Issues were found across all countries, languages, and platforms. 45% of all AI answers had at least one significant issue, ranging from incorrect source attribution to factual inaccuracies. Source referencing was especially problematic: in 31% of cases, references were missing or misleading, and 20% of responses contained major accuracy issues such as fabricated or outdated information. Gemini performed the worst, with significant issues in 76% of the tested responses.
https://www.aljazeera.com/economy/2025/10/22/ai-models-misrepresent-news-events-nearly-half-the-time-study-says – AI models such as ChatGPT routinely misrepresent news events, providing faulty responses to questions almost half the time, a study has found. The study published on Wednesday by the European Broadcasting Union (EBU) and the BBC assessed the accuracy of more than 2,700 responses given by OpenAI’s ChatGPT, Google’s Gemini, Microsoft’s Copilot, and Perplexity. Twenty-two public media outlets, representing 18 countries and 14 languages, posed a common set of questions to the AI assistants between late May and early June for the study.
https://www.newindianexpress.com/lifestyle/tech/2025/Oct/22/ai-not-a-reliable-source-of-news-eu-media-study-says-2651234.html – Artificial intelligence assistants such as ChatGPT made errors about half the time when asked about news events, according to a vast study by European public broadcasters released Wednesday. The mistakes included confusing news with parody, getting dates wrong or simply inventing events. The report by the European Broadcasting Union looked at four widely used assistants: OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Perplexity. Overall, 45 percent of all AI answers had “at least one significant issue”, regardless of language or country of origin, the report said. One out of every five answers “contained major accuracy issues, including hallucinated details and outdated information.” Of the four assistants, “Gemini performed worst with significant issues in 76 percent of responses, more than double the other assistants, largely due to its poor sourcing performance.”
https://www.niemanlab.org/2025/02/bbc-news-finds-that-ai-tools-distort-its-journalism-into-a-confused-cocktail-with-many-errors/ – The AI assistants’ answers contained “significant inaccuracies and distorted content from the BBC,” the company said. Over half (51%) of the AI answers had contained “significant issues of some form,” 19% of answers “introduced factual errors — incorrect factual statements, numbers, and dates,” and “13% of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.” Examples of errors include Google’s Gemini incorrectly stating that “The NHS advises people not to start vaping, and recommends that smokers who want to quit should use other methods,” which is inaccurate. Microsoft’s Copilot incorrectly stated that Gisèle Pelicot uncovered crimes after having blackouts, which is false. Perplexity misstated the date of Michael Mosley’s death and misquoted figures from Liam Payne’s family.
https://amp.dw.com/en/ai-chatbots-misrepresent-news-almost-half-the-time-says-major-new-study/a-74392921 – A major new study by 22 public service media organizations, including DW, has found that four of the most commonly used AI assistants misrepresent news content 45% of the time – regardless of language or territory. Journalists from public broadcasters like the BBC and NPR evaluated responses of ChatGPT, Copilot, Gemini, and Perplexity AI, assessing their accuracy, sourcing, context, editorialization, and fact-opinion distinction. The study showed that almost half of responses had at least one significant issue, with 31% having serious sourcing problems and 20% containing major factual errors. DW found that 53% of answers to its questions had significant issues, including errors like Olaf Scholz being named as German Chancellor when Friedrich Merz was actually the Chancellor at that time.

Noah Fact Check Pro

The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.

Freshness check

Score:
10

Notes:
The narrative is based on a recent press release from the European Broadcasting Union (EBU) and the BBC, dated October 21, 2025. Press releases typically warrant a high freshness score due to their timely dissemination of new information.

Quotes check

Score:
10

Notes:
The direct quotes in the narrative, such as those from Jean Philip De Tender, Media Director of the EBU, and Pete Archer, Head of AI at the BBC, are consistent with the press release. No discrepancies or variations in wording were found, indicating the quotes are accurately reproduced.

Source reliability

Score:
10

Notes:
The narrative originates from a press release issued by the European Broadcasting Union (EBU) and the BBC, both reputable organisations known for their commitment to journalistic integrity. This enhances the reliability of the information presented.

Plausability check

Score:
10

Notes:
The claims made in the narrative align with the findings of the EBU and BBC study, which has been reported by multiple reputable news outlets, including Reuters and Al Jazeera. The examples of inaccuracies, such as AI assistants misidentifying the current Pope, are consistent with the study’s reported findings. The language and tone are appropriate for a press release, and the content is directly relevant to the study’s objectives.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary:
The narrative is a recent press release from the EBU and the BBC, accurately quoting their findings on AI assistants misrepresenting news content. The information is consistent with reports from reputable news outlets, and the language and tone are appropriate for the context. No significant issues were identified, indicating a high level of credibility.

AI
Media
Misinformation

Register for Editor’s picks

Stay ahead of the curve with our Editor's picks newsletter – your weekly insight into the trends, challenges, and innovations driving the future of digital media.

Trending

Lenore Taylor steps down as Guardian Australia editor

Fake New Zealand news factories hijack real reporting

New York proposal for AI transparency and human oversight in journalism