Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

AI trained on AI garbage spits out AI garbage

MIT Technology Review: “AI models work by training on huge swaths of data from the internet. But as AI is increasingly being used to pump out web pages filled with junk content, that process is in danger of being undermined. New research published in Nature shows that the quality of the model’s output gradually degrades when AI trains on AI-generated data. As subsequent models produce output that is then used as training data for future models, the effect gets worse. Ilia Shumailov, a computer scientist from the University of Oxford, who led the study, likens the process to taking photos of photos. “If you take a picture and you scan it, and then you print it, and you repeat this process over time, basically the noise overwhelms the whole process,” he says. “You’re left with a dark square.” The equivalent of the dark square for AI is called “model collapse,” he says, meaning the model just produces incoherent garbage.  This research may have serious implications for the largest AI models of today, because they use the internet as their database. GPT-3, for example, was trained in part on data from Common Crawl, an online repository of over 3 billion web pages. And the problem is likely to get worse as an increasing number of AI-generated junk websites start cluttering up the internet…”

Sorry, comments are closed for this post.