Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Copyright

New AI standards group wants to make data scraping opt-in

Ars Technica: “The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the Internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to keep the source material flowing. The Dataset Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its stances on major AI-related issues. The alliance is made up of seven AI licensing companies, including music copyright-management firm Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. (At least five new members will be announced in the fall.) The DPA advocates for an opt-in system, meaning that data can be used only after consent is explicitly given by creators and rights holders. This represents a significant departure from the way most major AI companies operate. Some have developed their own opt-out systems, which put the burden on data owners to pull their work on a case-by-case basis. Others offer no opt-outs whatsoever. The DPA, which expects members to adhere to its opt-in rule, sees that route as the far more ethical one. “Artists and creators should be on board,” says Alex Bestall, CEO of Rightsify and the music-data-licensing company Global Copyright Exchange, who spearheaded the effort. Bestall sees opt-in as a pragmatic approach as well as a moral one: “Selling publicly available datasets is one way to get sued and have no credibility.”

The Internet Archive Loses Its Appeal of a Major Copyright Case

Wired unpaywalled: “The Internet Archive has lost a major legal battle [The case is Hachette Book Group Inc. v. Internet Archive, 2d Cir., No. 23-1260, 9/4/24.]—in a decision that could have a significant impact on the future of internet history. Today, the US Court of Appeals for the Second Circuit ruled against the long-running digital… Continue Reading

When A.I.’s Output Is a Threat to A.I. Itself

The New York Times – As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results. ” The internet is becoming awash in words and images generated by artificial intelligence. Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words… Continue Reading

U.S. Copyright Office Announces Updated Webinar on Copyright Essentials: Myths Explained

“The U.S. Copyright Office invites you to register to attend the upcoming online webinar, Copyright Essentials: Myths Explained, on September 18, 2024, at 1:00 p.m. eastern time. There is a lot of misleading information out there about copyright. On September 18, 2024, the U.S. Copyright Office will discuss what is and is not true when… Continue Reading

Pete Recommends – Weekly highlights on cyber security issues, August 24, 2024

Via LLRX – Pete Recommends – Weekly highlights on cyber security issues, August 24, 2024 – Privacy and cybersecurity issues impact every aspect of our lives – home, work, travel, education, finance, health and medical records – to name but a few. On a weekly basis Pete Weiss, highlights articles and information that focus on… Continue Reading

Face Search Engine Reverse Image Search

“PimEyes is an online face search engine that goes through the Internet to find pictures containing given faces. PimEyes uses face recognition search technologies to perform a reverse image search. Find a face and check where the image appears online. Our face finder helps you find a face and protect your privacy. Facial recognition online… Continue Reading

New web crawler launched by Meta last month is quietly scraping the internet for AI training data

Fortune [no paywall]: “Meta has quietly unleashed a new web crawler to scour the internet and collect data en masse to feed its AI model. The crawler, named the Meta External Agent, was launched last month according to three firms that track web scrapers and bots across the web. The automated bot essentially copies, or… Continue Reading

EU Proposal for an ePrivacy Regulation

“The European Commission’s proposal for a Regulation on ePrivacy aims at reinforcing trust and security in the digital world. Why a reform of ePrivacy legislation? European legislation needs to keep up with the fast pace at which IT-based services are developing and evolving. The Commission has started a major modernisation process of the data protection… Continue Reading

Google’s AI Search Gives Sites Dire Choice: Share Data or Die

Bloomberg [unpaywalled] – Publishers say blocking the company’s AI bot could also prevent their sites from showing up in search. Google now displays convenient artificial intelligence-based answers at the top of its search pages — meaning users may never click through to the websites whose data is being used to power those results. But many… Continue Reading

Has your paper been used to train an AI model? Almost certainly

Nature – Artificial-intelligence developers are buying access to valuable data sets that contain research papers — raising uncomfortable questions about copyright. “Academic publishers are selling access to research papers to technology firms to train artificial-intelligence (AI) models. Some researchers have reacted with dismay at such deals happening without the consultation of authors. The trend is… Continue Reading

The Files are in the Computer: On Copyright, Memorization, and Generative AI

Cooper, A. Feder and Grimmelmann, James and Grimmelmann, James, The Files are in the Computer: On Copyright, Memorization, and Generative AI (April 22, 2024). Cornell Legal Studies Research Paper Forthcoming, Chicago-Kent Law Review, Forthcoming, Available at SSRN: https://ssrn.com/abstract=4803118 – “The New York Times’s copyright lawsuit against OpenAI and Microsoft alleges that OpenAI’s GPT models have… Continue Reading