Knowledge Management

The battle over copyright in the age of ChatGPT

by Sabrina I. Pacifici on Dec 22, 2024

Boston Review: “Questions of AI authorship and ownership can be divided into two broad types. One concerns the vast troves of human-authored material fed into AI models as part of their “training” (the process by which their algorithms “learn” from data). The other concerns ownership of what AIs produce. Call these, respectively, the input and output problems. So far, attention—and lawsuits—have clustered around the input problem. The basic business model for LLMs relies on the mass appropriation of human-written text, and there simply isn’t anywhere near enough in the public domain. OpenAI hasn’t been very forthcoming about its training data, but GPT-4 was reportedly trained on around thirteen trillion “tokens,” roughly the equivalent of ten trillion words. This text is drawn in large part from online repositories known as “crawls,” which scrape the internet for troves of text from news sites, forums, and other sources. Fully aware that vast data scraping is legally untested—to say the least—developers charged ahead anyway, resigning themselves to litigating the issue in retrospect. Lawyer Peter Schoppert has called the training of LLMs without permission the industry’s “original sin”—to be added, we might say, to the technology’s mind-boggling consumption of energy and water in an overheating planet. (In September, Bloomberg reported that plans for new gas-fired power plants have exploded as energy companies are “racing to meet a surge in demand from power-hungry AI data centers.”) The scale of the prize is vast: intellectual property accounts for some 90 percent of recent U.S. economic growth. Indeed, crawls contain enormous amounts of copyrighted information; the Common Crawl alone, a standard repository maintained by a nonprofit and used to train many LLMs, contains most of b-ok.org, a huge repository of pirated ebooks that was shut down by the FBI in 2022. The work of many living human authors was on another crawl, called Books3, which Meta used to train LLaMA. Novelist Richard Flanagan said that this training made him feel “as if my soul had been strip mined and I was powerless to stop it.” A number of authors, including Junot Díaz, Ta-Nehisi Coates, and Sarah Silverman, sued OpenAI in 2023 for the unauthorized use of their work for training, though the suit was partially dismissed early this year. Meanwhile, the New York Times is in ongoing litigation against OpenAI and Microsoft for using its content to train chatbots that, it claims, are now its competitors. As of this writing, AI companies have largely responded to lawsuits with defensiveness and evasion, refusing in most cases even to divulge what exact corpora of text their models are trained on. Some newspapers, less sure they can beat the AI companies, have opted to join them: the Financial Times, for one, minted a “strategic partnership” with OpenAI in April, while in July Perplexity launched a revenue-sharing “publisher’s program” that now counts Time, Fortune, Texas Tribune, and WordPress.com among its partners. At the heart of these disputes, the input problem asks: Is it fair to train the LLMs on all that copyrighted text without remunerating the humans who produced it? The answer you’re likely to give depends on how you think about LLMs…”

Even laypeople use legalese

by Sabrina I. Pacifici on Dec 18, 2024

MIT News – “MIT study explains why laws are written in an incomprehensible style. Legal documents are notoriously difficult to understand, even for lawyers. This raises the question: Why are these documents written in a style that makes them so impenetrable? MIT cognitive scientists believe they have uncovered the answer to that question. Just as… Continue Reading

Introducing DiscoverGov: GPO’s New Discovery Search Tool

by Sabrina I. Pacifici on Dec 18, 2024

GPO is pleased to introduce DiscoverGov, our new, web-based search tool. DiscoverGov provides simple, one-stop searching across multiple U.S. Federal Government databases, including GPO’s Catalog of U.S. Government Publications (CGP) and GovInfo. It will retrieve reports, articles, and citations while providing direct links to selected resources and publications available online. Come meet DiscoverGov as we release… Continue Reading

Federal government discloses more than 1,700 AI use cases

by Sabrina I. Pacifici on Dec 18, 2024

FedScoop: “A consolidated list of federal artificial intelligence use cases released by the White House on Wednesday shows agencies more than doubled the amount of uses reported last year.Per the 2024 consolidated inventory, which is available on the Office of Management and Budget’s GitHub, 37 federal agencies have reported 1,757 public AI uses. A consolidated… Continue Reading

New research shows how many important links on the web get lost to time

by Sabrina I. Pacifici on Dec 17, 2024

The Verge [unpaywalled]: “A quarter of the deep links in The New York Times’ articles are now rotten, leading to completely inaccessible pages, according to a team of researchers from Harvard Law School, who worked with the Times’ digital team. They found that this problem affected over half of the articles containing links in the… Continue Reading

Literary Hub List of Lists – Best Books 2024

by Sabrina I. Pacifici on Dec 17, 2024

TIME’s 100 Must-Read Books of 2024 and 10 Best Fiction Books of 2024 and 10 Best Nonfiction Books of 2024 • Publishers Weekly’s Best Books of 2024 • Vanity Fair’s 21 Best Books of 2024 to Read Right Now • The Economist’s Best Books of 2024 • The New York Times’ 100 Notable Books of 2024 and 10… Continue Reading

Are adults forgetting how to read?

by Sabrina I. Pacifici on Dec 16, 2024

The Economist – A survey by the OECD suggests so – “Are you smarter than a ten-year-old? New data suggest that a shockingly large portion of adults in the rich world might not be. Roughly one-fifth of people aged 16 to 65 perform no better in tests of maths and reading than would be expected… Continue Reading

Introducing QuizBot an Innovative AI-Assisted Assessment in Legal Education

by Sabrina I. Pacifici on Dec 16, 2024

Harrington, Sean, Introducing QuizBot an Innovative AI-Assisted Assessment in Legal Education (October 03, 2024). Available at SSRN: https://ssrn.com/abstract=4975804 or http://dx.doi.org/10.2139/ssrn.4975804 – “This Article explores an innovative approach to assessment in legal education: an AI-assisted quiz system implemented in an AI & the Practice of Law course. The system employs a Socratic method-inspired chatbot to engage… Continue Reading

Revolutionizing Legal Education with AI: The Socratic Quizbot

by Sabrina I. Pacifici on Dec 16, 2024

AI Law Librarians – Sean Harrington – “I had the pleasure of co-teaching AI and the Practice of Law with Kenton Brice last semester at OU Law. It was an incredible experience. When we met to think through how we would teach this course, we agreed on one crucial component: We wanted the students to… Continue Reading

You Can Now Search the Internet With ChatGPT

by Sabrina I. Pacifici on Dec 16, 2024

Lifehacker – “ChatGPT search has been out now for about a month and a half, following a Halloween announcement from OpenAI. With this new feature, the company finally rolled out an official competitor to AI search engines like Perplexity, Google’s AI Overviews, and Microsoft Bing (powered by Copilot). OpenAI originally announced its search plans back… Continue Reading

How Silicon Valley is disrupting democracy

by Sabrina I. Pacifici on Dec 16, 2024

MIT Technology Review – “Two books explore the price we’ve paid in handing over unprecedented power to Big Tech—and explain why it’s imperative we start taking it back. The internet loves a good neologism, especially if it can capture a purported vibe shift or explain a new trend. In 2013, the columnist Adrian Wooldridge coined… Continue Reading

Social media needs (dumpster) fire exits

by Sabrina I. Pacifici on Dec 15, 2024

Pluralistic: “Of course you should do everything you can to prevent fires – and also, you should build fire exits, because no matter how hard to you try, stuff burns. That includes social media sites. Social media has its own special form of lock-in: we use social media sites to connect with friends, family members,… Continue Reading

M	T	W	T	F	S	S
« Nov
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Category Archives: Knowledge Management