Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

Why a ruling against the Internet Archive threatens the future of America’s libraries

MIT Technology Review – “The decision locks libraries into an ecosystem that is not in readers’ interests. Congress must act. I was raised in the 1980s and ’90s, and for my generation and generations before us, the public library was an equalizing force in every town, helping anyone move toward the American dream. In Chantilly, Virginia, where I grew up, it didn’t matter if you didn’t have a computer or your parents lacked infinite money for tutors—you could get a lifetime’s education for free at the public library. A ruling from the US Second Circuit against the Internet Archive and in favor of publisher Hachette has just thrown that promise of equality into doubt by limiting libraries’ access to digital lending. To understand why this is so important to the future of libraries, you first have to understand the dire state of library e-book lending.  Libraries have traditionally operated on a basic premise: Once they purchase a book, they can lend it out to patrons as much (or as little) as they like. Library copies often come from publishers, but they can also come from donations, used book sales, or other libraries. However the library obtains the book, once the library legally owns it, it is theirs to lend as they see fit.  Not so for digital books. To make licensed e-books available to patrons, libraries have to pay publishers multiple times over. First, they must subscribe (for a fee) to aggregator platforms such as Overdrive. Aggregators, like streaming services such as HBO’s Max, have total control over adding or removing content from their catalogue. Content can be removed at any time, for any reason, without input from your local library. The decision happens not at the community level but at the corporate one, thousands of miles from the patrons affected.  Then libraries must purchase each individual copy of each individual title that they want to offer as an e-book. These e-book copies are not only priced at a steep markup—up to 300% over consumer retail—but are also time- and loan-limited, meaning the files self-destruct after a certain number of loans. The library then needs to repurchase the same book, at a new price, in order to keep it in stock. This upending of the traditional order puts massive financial strain on libraries and the taxpayers that fund them. It also opens up a world of privacy concerns; while libraries are restricted in the reader data they can collect and share, private companies are under no such obligation…”

New AI standards group wants to make data scraping opt-in

Ars Technica: “The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the Internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to… Continue Reading

The Internet Archive Loses Its Appeal of a Major Copyright Case

Wired unpaywalled: “The Internet Archive has lost a major legal battle [The case is Hachette Book Group Inc. v. Internet Archive, 2d Cir., No. 23-1260, 9/4/24.]—in a decision that could have a significant impact on the future of internet history. Today, the US Court of Appeals for the Second Circuit ruled against the long-running digital… Continue Reading

When A.I.’s Output Is a Threat to A.I. Itself

The New York Times – As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results. ” The internet is becoming awash in words and images generated by artificial intelligence. Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words… Continue Reading

U.S. Copyright Office Announces Updated Webinar on Copyright Essentials: Myths Explained

“The U.S. Copyright Office invites you to register to attend the upcoming online webinar, Copyright Essentials: Myths Explained, on September 18, 2024, at 1:00 p.m. eastern time. There is a lot of misleading information out there about copyright. On September 18, 2024, the U.S. Copyright Office will discuss what is and is not true when… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, August 24, 2024

Via LLRX – Pete Recommends – Weekly highlights on cyber security issues, August 24, 2024 – Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, finance, health and medical records – to name but a few. On a weekly basis Pete Weiss, highlights articles and information that focus on… Continue Reading

Face Search Engine Reverse Image Search

“PimEyes is an online face search engine that goes through the Internet to find pictures containing given faces. PimEyes uses face recognition search technologies to perform a reverse image search. Find a face and check where the image appears online. Our face finder helps you find a face and protect your privacy. Facial recognition online… Continue Reading

New web crawler launched by Meta last month is quietly scraping the internet for AI training data

Fortune [no paywall]: “Meta has quietly unleashed a new web crawler to scour the internet and collect data en masse to feed its AI model. The crawler, named the Meta External Agent, was launched last month according to three firms that track web scrapers and bots across the web. The automated bot essentially copies, or… Continue Reading

EU Proposal for an ePrivacy Regulation

“The European Commission’s proposal for a Regulation on ePrivacy aims at reinforcing trust and security in the digital world. Why a reform of ePrivacy legislation? European legislation needs to keep up with the fast pace at which IT-based services are developing and evolving. The Commission has started a major modernisation process of the data protection… Continue Reading

Google’s AI Search Gives Sites Dire Choice: Share Data or Die

Bloomberg [unpaywalled] – Publishers say blocking the company’s AI bot could also prevent their sites from showing up in search. Google now displays convenient artificial intelligence-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many… Continue Reading

Has your paper been used to train an AI model? Almost certainly

Nature – Artificial-intelligence developers are buying access to valuable data sets that contain research papers — raising uncomfortable questions about copyright. “Academic publishers are selling access to research papers to technology firms to train artificial-intelligence (AI) models. Some researchers have reacted with dismay at such deals happening without the consultation of authors. The trend is… Continue Reading