Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

Developer Creates Infinite Maze That Traps AI Training Bots

404 Media – “A pseudonymous coder has created and released an open source “tar pit” to indefinitely trap AI training web crawlers in an infinitely, randomly-generating series of pages to waste their time and computing power. The program, called Nepenthes after the genus of carnivorous pitcher plants which trap and consume their prey, can be deployed by webpage owners to protect their own content from being scraped or can be deployed “offensively” as a honeypot trap to waste AI companies’ resources. “It’s less like flypaper and more an infinite maze holding a minotaur, except the crawler is the minotaur that cannot get out. The typical web crawler doesn’t appear to have a lot of logic. It downloads a URL, and if it sees links to other URLs, it downloads those too. Nepenthes generates random links that always point back to itself – the crawler downloads those new links. Nepenthes happily just returns more and more lists of links pointing back to itself,” Aaron B, the creator of Nepenthes, told 404 Media. “Of course, these crawlers are massively scaled, and are downloading links from large swathes of the internet at any given time,” they added. “But they are still consuming resources, spinning around doing nothing helpful, unless they find a way to detect that they are stuck in this loop.” Human users can see how Nepenthes works by clicking here, though I must warn that the page loads incredibly slowly (on purpose) and links endlessly to pages that load the same way. It looks like this, in practice…”

Wikipedia:Database download

Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC-BY-SA), and most is additionally licensed under the GNU Free Documentation License… Continue Reading

Strict Scrutiny

“A podcast about the United States Supreme Court and the legal culture that surrounds it. Hosted by three badass constitutional law professors– Leah Litman, Kate Shaw, and Melissa Murray– Strict Scrutiny provides in-depth, accessible, and irreverent analysis of the Supreme Court and its cases, culture, and personalities. Each week, Leah, Kate, and Melissa break down… Continue Reading

Meta Secretly Trained Its AI on a Notorious Piracy Database

Wired – [unpaywalled] Newly Unredacted Court Docs Reveal – One of the most important AI copyright legal battles just took a major turn : “Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against… Continue Reading

Announcing the Public Domain Image Archive

“After the hundreds (thousands?) of hours trawling through online image collections since the PDR’s inception, we’ve decided it was time to create one of our own! We are really excited to share with you the launch of our new sister-project, the Public Domain Image Archive (PDIA), a curated collection of more than 10,000 out-of-copyright historical… Continue Reading

LLRX December 2024 Articles and Columns

December 2024 – LLRX.com® – the free web journal on law, technology, knowledge discovery and research for Librarians, Lawyers, Researchers, Academics, and Journalists. Founded in 1996. January 1, 2025 is Public Domain Day: Works from 1929 are open to all, as are sound recordings from 1924 – by Jennifer Jenkins. AI in Finance and Banking,… Continue Reading

Frida Kahlo and Henri Matisse Enter the Public Domain

Hyperallergic: “Happy Public Domain Day! Starting today, January 1, you can legally access, adapt, remix, and republish (depending on your jurisdiction) the work of Henri Matisse, Frida Kahlo, and Robert Capa, as well as certain texts by William Faulkner, Virginia Woolf, and Ernest Hemingway, among others. In the United States, the copyright term surrounding commissioned… Continue Reading

The battle over copyright in the age of ChatGPT

Boston Review: “Questions of AI authorship and ownership can be divided into two broad types. One concerns the vast troves of human-authored material fed into AI models as part of their “training” (the process by which their algorithms “learn” from data). The other concerns ownership of what AIs produce. Call these, respectively, the input and… Continue Reading

Every AI Copyright Lawsuit in the US, Visualized

Wired: “WIRED is following every copyright battle involving the AI industry—and we’ve created some handy visualizations that will be updated as the cases progress. In May 2020, the media and technology conglomerate Thomson Reuters sued a small legal AI startup called Ross Intelligence, alleging that it had violated US copyright law by reproducing materials from… Continue Reading