Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

There’s No Longer Any Doubt That Hollywood Writing Is Powering AI

The Atlantic – Dialogue from these movies and TV shows has been used by companies such as Apple and Anthropic to train AI systems [unpaywalled] By Alex Reisner – “I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts. If a chatbot can mimic a crime-show mobster or a sitcom alien—or, more pressingly, if it can piece together whole shows that might otherwise require a room of writers—data like this are part of the reason why.”

How ChatGPT Search (Mis)represents Publisher Content

Columbia Journalism Review – “ChatGPT search—which is positioned as a competitor to search engines like Google and Bing—launched with a press release from OpenAI touting claims that the company had “collaborated extensively with the news industry” and “carefully listened to feedback” from certain news organizations that have signed content licensing agreements with the company. In… Continue Reading

Canadian legal information database sues company behind AI chatbot

CBA – Lawsuit filed in B.C. Supreme Court alleges that Caseway AI violates CanLII’s terms of service and copyrights: “The Canadian Legal Information Institute (CanLII) has taken the makers of an AI chatbot to court over what it says is a violation of its terms of service, due to the chatbot scraping CanLII’s database in… Continue Reading

Ziff Davis study says AI firms rely on publisher data to train models

Axios: “Leading AI companies such as OpenAI, Google and Meta rely more on content from premium publishers to train their large language models (LLMs) than they publicly admit, according to new research from executives at Ziff Davis, one of the largest publicly-traded digital media companies. Why it matters: Publishers believe that the more they can… Continue Reading

Google Asked to Remove 10 Billion “Pirate” Search Results

TorrentFreak – “Rightsholders have asked Google to remove more than 10 billion ‘copyright infringing’ URLs from its search results. The search engine doesn’t celebrate the milestone in any way, but the takedown notices document intriguing shifts in volume over time, as well as shifting takedown interests. While search engines are extremely helpful for the average… Continue Reading

Metropolitan Museum of Art Puts 490,000 High-Res Images Online & Makes Them Free to Use

Open Culture: “The Metropolitan Museum of Art has put online 492,000 high-resolution images of artistic works. Even better, the museum has placed the vast majority of these images into the public domain, meaning they can be downloaded directly from the museum’s website for non-commercial use. When you browse the Met collection and find an image… Continue Reading

Vanishing Culture: A Report on Our Fragile Cultural Record

Internet Archives Blogs: “In today’s digital landscape, corporate interests, shifting distribution models, and malicious cyber attacks are threatening public access to our shared cultural history. The rise of streaming platforms and temporary licensing agreements means that sound recordings, books, films, and other cultural artifacts that used to be owned in physical form, are now at… Continue Reading

What are the current swing states, and how have they changed over time?

USA Facts: “Swing states, also known as battleground states, are states that could “swing” to either Democratic or Republican candidates depending on the election. Because of their potential to be won by either candidate, political parties often spend a disproportionate amount of time and campaign resources on winning these states. While there is no universal… Continue Reading

Unlocking AI for All: The Case for Public Data Banks

LawFare: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may… Continue Reading

Inside the $621 Million Legal Battle for the ‘Soul of the Internet’

RollingStone via MSN [no paywall]: “Major record labels have sued the online library Internet Archive over thousands of old recordings, raising the question: Who owns the past?Before founding the Internet Archive, Kahle worked as a computer scientist, making major contributions to personal computing and the early internet during the Eighties and Nineties. With the Archive,… Continue Reading

U.S. Court Orders LibGen to Pay $30m to Publishers, Issues Broad Injunction

TorrentFreak: “Yesterday, U.S. District Court Judge Colleen McMahon granted the default judgment without any changes. The anonymous LibGen defendants are responsible for willful copyright infringement and their activities should be stopped. “Plaintiffs have been irreparably harmed as a result of Defendants’ unlawful conduct and will continue to be irreparably harmed should Defendants be allowed to… Continue Reading