Esquire – As the law fights to catch up to Big Tech, the future of books hangs in the balance. Are writers doomed by “the biggest rip-off in creative history” or could AI offer new ways of making a living? “I get hung up on the word scraping,” author R.O. Kwon says. “It sounds quite violent.” Last September, when Kwon learned that her first novel, The Incendiaries, was part of the Books3 dataset that some generative AI models were trained on at the time, she felt violated. She and other authors took to social media, lobbing anger, hurt, and frustration at the tech companies that had secretly “scraped” the Internet for data without consent from or compensation for creators. Kwon’s novels and others were poured into machine learning models, teaching them how to make “new” content based on patterns in the ingested text. (It’s this “generating” that makes generative AI distinct from other types of models that may only identify patterns or make calculations.) The years of work on those books added up: 10 years for one novel, 20 for a memoir, multiplied by the nearly 200,000 books found in the dataset. “It’s potentially the biggest rip-off in creative history,” says Douglas Preston, a best-selling author and one of the plaintiffs in the class-action lawsuit filed in the aftermath of the outrage. In September 2023, 17 authors partnered with the Authors Guild, the oldest and largest professional organization for writers, to file a lawsuit alleging that Microsoft and ChatGPT creator OpenAI violated copyright law by ingesting books into their generative AI models. OpenAI and Microsoft, for their part, deny allegations that they infringed any copyrights. The tech companies claim that training their models on copyrighted content is equivalent to a person reading books to improve their own writing. The future of books—and perhaps of creative industries writ large in the United States—may come down to one judge’s definition of “fair use.” Words and who gets to use them are serious business. But an ecosystem around text-based generative AI evolved well before The Atlantic revealed the contents of key datasets. Large language models (LLMs) have been in development since 2017, and OpenAI’s GPT-3, the model that introduced generative AI to the mainstream, hit the world back in 2020. Now, tools, workflows, companies, industry standards, and, of course, grifts are in full operation, already shifting the way some books are written, published, and read. The technology has clicked right into the publishing industry’s recent trend toward efficiency, consolidation, and reader service—and seemingly away from sustainability for human labor. But some believe that generative AI could offer a path forward for writers at a time when it’s harder than ever to make a living through books. It all depends on the meaning of a few words.”
Sorry, comments are closed for this post.