Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

“Model collapse” threatens to kill progress on generative AIs

Big Think: When AI eats its own product, it gets sick. Key Takeaways

  • Generative AI exploded in popularity when OpenAI released ChatGPT.
  • A paper published in Nature looked at what happens when AI is trained on “synthetic data,” or content created by an AI rather than humans.
  • Flaws in the synthetic data led to even more mistakes in the AI’s output, a result dubbed “model collapse” by the researchers.

“Generative AI has been around for decades, but the systems exploded into the public consciousness in 2022, when OpenAI released ChatGPT, an AI chatbot that could produce remarkably human-like text. The AI gained this ability by analyzing a lot of text created by people, mostly pulled from the internet. To put it simply, from this data, it learned to predict what word was most likely to come next in a sequence based on the words that came before it. To improve their generative AIs, OpenAI and other developers need ever more high-quality training data — but now that publishers know their content is being used to train AIs, they’ve started requesting money for it and, in some cases, suing developers for using it without permission. Even if developers had free access to all the data online, though, it still wouldn’t be enough. “If you could get all the data that you needed off the web, that would be fantastic,” Aidan Gomez, CEO of AI startup Cohere, told the Financial Times. “In reality, the web is so noisy and messy that it’s not really representative of the data that you want. The web just doesn’t do everything we need.”

Sorry, comments are closed for this post.