Washington Post [unpaywalled]: “In the hour after President Biden announced he would withdraw from the 2024 campaign on Sunday, most popular AI chatbots seemed oblivious to the news. Asked directly whether he had dropped out, almost all said no or declined to give an answer. Asked who was running for president of the United States, they still listed his name. For the past week, we’ve tested AI chatbots’ approach to breaking political stories and found they were largely not able to keep up with consequential real-time news. Most didn’t have current information, gave incorrect answers, or declined to answer and pushed users to check news sources. Now, with just months left until the presidential election and bombshell political news dropping at a steady clip, AI chatbots are distancing themselves from politics and breaking news or refusing to answer at all. AI chatbot technology burst onto the scene two years ago, promising to revolutionize how we get information. Many of the top bots tout their access to recent information, and some have suggested using the tools to catch up on current events. But companies that make chatbots don’t appear ready for their AI to play a larger role in how people follow this election…”
- See also The New York Times [unpaywalled]: The Data That Powers AI Is Disappearing Fast: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models. Now, that data is drying up. Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group. The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested. The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt. The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service. “We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.”