December 22, 2024 | beSpacific

The battle over copyright in the age of ChatGPT

by Sabrina I. Pacifici on Dec 22, 2024

Boston Review: “Questions of AI authorship and ownership can be divided into two broad types. One concerns the vast troves of human-authored material fed into AI models as part of their “training” (the process by which their algorithms “learn” from data). The other concerns ownership of what AIs produce. Call these, respectively, the input and output problems. So far, attention—and lawsuits—have clustered around the input problem. The basic business model for LLMs relies on the mass appropriation of human-written text, and there simply isn’t anywhere near enough in the public domain. OpenAI hasn’t been very forthcoming about its training data, but GPT-4 was reportedly trained on around thirteen trillion “tokens,” roughly the equivalent of ten trillion words. This text is drawn in large part from online repositories known as “crawls,” which scrape the internet for troves of text from news sites, forums, and other sources. Fully aware that vast data scraping is legally untested—to say the least—developers charged ahead anyway, resigning themselves to litigating the issue in retrospect. Lawyer Peter Schoppert has called the training of LLMs without permission the industry’s “original sin”—to be added, we might say, to the technology’s mind-boggling consumption of energy and water in an overheating planet. (In September, Bloomberg reported that plans for new gas-fired power plants have exploded as energy companies are “racing to meet a surge in demand from power-hungry AI data centers.”) The scale of the prize is vast: intellectual property accounts for some 90 percent of recent U.S. economic growth. Indeed, crawls contain enormous amounts of copyrighted information; the Common Crawl alone, a standard repository maintained by a nonprofit and used to train many LLMs, contains most of b-ok.org, a huge repository of pirated ebooks that was shut down by the FBI in 2022. The work of many living human authors was on another crawl, called Books3, which Meta used to train LLaMA. Novelist Richard Flanagan said that this training made him feel “as if my soul had been strip mined and I was powerless to stop it.” A number of authors, including Junot Díaz, Ta-Nehisi Coates, and Sarah Silverman, sued OpenAI in 2023 for the unauthorized use of their work for training, though the suit was partially dismissed early this year. Meanwhile, the New York Times is in ongoing litigation against OpenAI and Microsoft for using its content to train chatbots that, it claims, are now its competitors. As of this writing, AI companies have largely responded to lawsuits with defensiveness and evasion, refusing in most cases even to divulge what exact corpora of text their models are trained on. Some newspapers, less sure they can beat the AI companies, have opted to join them: the Financial Times, for one, minted a “strategic partnership” with OpenAI in April, while in July Perplexity launched a revenue-sharing “publisher’s program” that now counts Time, Fortune, Texas Tribune, and WordPress.com among its partners. At the heart of these disputes, the input problem asks: Is it fair to train the LLMs on all that copyrighted text without remunerating the humans who produced it? The answer you’re likely to give depends on how you think about LLMs…”

Cover Your Tracks

by Sabrina I. Pacifici on Dec 22, 2024

EFF – “Cover Your Tracks is two things: a tool for users to understand how unique and identifiable their browser makes them online, and a research project to uncover the tools and techniques of online trackers and test the efficacy of privacy add-ons. Cover Your Tracks researches both how unique your browser is and how… Continue Reading

Review of DOJ Process to Obtain Records of Members of Congress, Media

by Sabrina I. Pacifici on Dec 22, 2024

DOJ Oversight and Review Division 25-01. Redacted For Public Release. A Review of the Department of Justice’s Issuance of Compulsory Process to Obtain Records of Members of Congress, Congressional Staffers, and Members of the News Media: “In the spring and summer of 2017, CNN.com (CNN), The New York Times, and The Washington Post published articles… Continue Reading

Mac Option Key Is the Most Important Key You Don’t Know About

by Sabrina I. Pacifici on Dec 22, 2024

How to Geek: The Option key on a Mac is a modifier key that is used to change the functions of menus and keyboard shortcuts. Use Option key to do things like force quit unresponsive apps, fine-tune the volume and brightness, bypass confirmation when shutting down or deleting files, and more. Try the Option key… Continue Reading

AI has taken over Google Search and Image results

by Sabrina I. Pacifici on Dec 22, 2024

Android Trends: “A recent study shows that AI has taken over a big part of Google’s search results. AI-generated summaries now appear in almost half of all searches, taking up a lot of space on both desktop and mobile screens. This is changing the way people experience Google search. Also, Google Image Search now shows… Continue Reading

Senate Judiciary Committee Investigative Report on Ethical Crisis at the Supreme Court

by Sabrina I. Pacifici on Dec 22, 2024

The culmination of a 20-month investigation, the staff report features new information and a comprehensive analysis of the ongoing ethics challenge at the Supreme Court. The Senate Judiciary Committee, chaired by U.S. Senate Majority Whip Dick Durbin (D-IL), released the findings of its 20-month investigation into the ethical crisis at the Supreme Court, including the… Continue Reading

M	T	W	T	F	S	S
« Nov				Jan »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Daily Archives: December 22, 2024