Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

67% of Top News Sites Block Access by AI

NewsGuard’s Reality Check Special Report: “In tech lingo, “garbage in, garbage out” means that if bad data goes into a system, expect bad results. The same holds true for the accuracy of AI chatbots. A NewsGuard analysis found that 67 percent of the news websites rated as top quality by NewsGuard block access to their journalism to AI models. This means the AI models must rely disproportionately on the low-quality news sites that allow chatbots to use their content. This helps explain why chatbots so often spread false claims and misinformation. A NewsGuard analysis of the top 500 most-engaged news websites found that sites with lower NewsGuard Trust Scores — those more likely to have advanced false or misleading information, as assessed by NewsGuard — are more likely to be included in the training data accessed by the AI models. This is because they are less likely to ask web crawlers that feed data to popular AI chatbots to avoid their sites. In contrast, many high-quality news websites have put up the equivalent of “Do Not Trespass” signs, at least until the AI companies pay them through licenses to be able to access their journalism.

  • This means that the world’s most popular chatbots may be pulling from untrustworthy sources more often than would typically occur on the open web, such as through traditional search. However, because the chatbot companies have not disclosed exactly how they source or use their data, we cannot know for certain which specific sources are influencing their responses. Disinformation websites from Russia, China, and Iran, conspiracy websites, and health care hoax sites peddling quack cures are only too happy to have their content train the AI models. In contrast, high-quality news sites whose journalism is worth paying for want to get paid if the AI models access their journalism, not to give away their content.
  • Examples of low-quality sites that do not request chatbots to avoid their content include The Epoch Times (NewsGuard Trust Score: 17.5/100); ZeroHedge (Trust Score: 15/100), a finance blog that advances debunked conspiracy theories; and Bipartisan Report (Trust Score: 57.5/100), a news and commentary site that regularly mixes news and opinion without disclosing its liberal agenda.
  • Examples of high-quality sites that request chatbots to avoid their content include NBCNews.com (Trust Score: 100/100); Today.com (Trust Score: 95/100); and TheGuardian.com (Trust Score: 100/100). 
  • Some news publishers go beyond blocking AI models and are litigating. In December 2023, The New York Times (Trust Score: 87.5/100) for example, sued OpenAI and Microsoft for copyright infringement, arguing the companies were training chatbots with its articles without a commercial agreement, and in the meantime is blocking access to its journalism.
  • Context: Chatbots use data gathered from across the internet to answer questions and engage in conversations.”

Sorry, comments are closed for this post.