Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

Caselaw Access Project

“The Caselaw Access Project (“CAP”) expands public access to U.S. law. Our goal is to make all published U.S. court decisions freely available to the public online, in a consistent format, digitized from the collection of the Harvard Law School Library. We created CAP’s initial collection by digitizing roughly 40 million pages of court decisions contained in roughly 40,000 bound volumes owned by the Harvard Law School Library. The Harvard Law School Collection includes volumes published through 2018. The Harvard Law School Collection was digitized on site at Langdell Hall. Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes. The Harvard Law School Collection does not include:

  • Cases not designated as officially published, such as most lower court decisions.
  • Non-published trial documents such as party filings, orders, and exhibits.
  • Parallel versions of cases from regional reporters, unless those cases were designated by a court as official.
  • Cases officially published in digital form, such as recent cases from Illinois, Arkansas, New Mexico, and North Carolina.
  • Copyrighted material such as headnotes, for cases still under copyright…”

Axel Springer vs. Google

Fortune: “Axel Springer is at Google’s throat again. The German news-publishing giant (for which I worked in my days at Politico) has a long history of battling Google over the issue of so-called ancillary copyright fees—payments for carrying snippets of text and thumbnail images in search results. But now it’s waging war on another front:… Continue Reading

Generative AI Might Finally Bend Copyright Past the Breaking Point

The Atlantic [unpaywalled] – For more than 200 years, copyright law has promoted a creative society. The chatbots could change everything. “It took Ralph Ellison seven years to write Invisible Man. It took J. D. Salinger about 10 to write The Catcher in the Rye. J. K. Rowling spent at least five years on the… Continue Reading

Tumblr and WordPress to Sell Users’ Data to Train AI Tools

404Media: “Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals. The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed… Continue Reading

Why The New York Times might win its copyright lawsuit against OpenAI

Ars Technica: “The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times “has a near zero probability of winning” its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views. “Trying to get… Continue Reading

Judge rejects most ChatGPT copyright claims from book authors

Ars Technica: “A US district judge in California has largely sided with OpenAI, dismissing the majority of claims raised by authors alleging that large language models powering ChatGPT were illegally trained on pirated copies of their books without their permission. By allegedly repackaging original works as ChatGPT outputs, authors alleged, OpenAI’s most popular chatbot was just… Continue Reading

Microsoft announces AI newsroom projects with Semafor and others, as NYT lawsuit looms

GeekWire: “Microsoft announced five projects to help news organizations incorporate generative artificial intelligence into their operations, building on its existing efforts to use technology to support the role of journalism in democracy. The company will collaborate on different initiatives with Semafor, the Craig Newmark Graduate School of Journalism at CUNY, the Online News Association, the GroundTruth… Continue Reading

The Raven by Edgar Allan Poe

1845. “The Raven” is published in The Evening Mirror in New York, the 1st publication with the name of the author, Edgar Allan Poe. Its publication made Poe popular in his lifetime, although it did not bring him much financial success. The poem was soon reprinted, parodied, & illustrated. Critical opinion is divided as to… Continue Reading

What Happened to My Search Engine?

Ted Gioia – Or why tech upgrades are now mostly downgrades: “…Here are the things missing from the original search engines. They didn’t practice 24/7 surveillance of users. They didn’t sell user’s private information. They didn’t fill up search results with garbage in order to collect placement fees. They didn’t manipulate users—prodding them to use… Continue Reading

OpenAI warns copyright crackdown could doom ChatGPT

Telegraph: “The maker of ChatGPT has warned that a ban on using news and books to train chatbots would doom the development of artificial intelligence. OpenAI has told peers that it would be “impossible” to create services such as ChatGPT if it were prevented from relying on copyrighted works, as it seeks to influence potential… Continue Reading

More Than Just Mickey: Chaplin, Peter Pan, ‘Western Front’ Enter Public Domain

Rolling Stone “Winnie the Pooh’s Tigger, films by Buster Keaton, Lady Chatterley’s Lover, and — yes — the Mickey Mouse in Steamboat Willie are now fair use as of Jan. 1, Public Domain Day 2024. Jan. 1, isn’t just New Year’s Day — it’s also Public Domain Day, where thousands of cinematic treasures, literary classics,… Continue Reading