Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech

The AtlanticEditor’s note: This searchable database is part of The Atlantic’s series on Books3. You can read about the origins of the database here, and an analysis of what’s in it here. “This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.”

Sorry, comments are closed for this post.