Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

You Are Here: Home » AI, Copyright, Digital Rights, E-Records, Education, Internet, Knowledge Management, Legal Research, Libraries, Privacy » Your Personal Information Is Probably Being Used to Train Generative AI Models

Your Personal Information Is Probably Being Used to Train Generative AI Models

by Sabrina I. Pacifici on Oct 18, 2023

Scientific American: “Artists and writers are up in arms about generative artificial intelligence systems—understandably so. These machine learning models are only capable of pumping out images and text because they’ve been trained on mountains of real people’s creative work, much of it copyrighted. Major AI developers including OpenAI, Meta and Stability AI now face multiple lawsuits on this. Such legal claims are supported by independent analyses; in August, for instance, the Atlantic reported finding that Meta trained its large language model (LLM) in part on a data set called Books3, which contained more than 170,000 pirated and copyrighted books. And training data sets for these models include more than books. In the rush to build and train ever-larger AI models, developers have swept up much of the searchable Internet. This not only has the potential to violate copyrights but also threatens the privacy of the billions of people who share information online. It also means that supposedly neutral models could be trained on biased data. A lack of corporate transparency makes it difficult to figure out exactly where companies are getting their training data—but Scientific American spoke with some AI experts who have a general idea.”

Facebook Tweet LinkedIn

Sorry, comments are closed for this post.

Support beSpacific

Research updates provided daily since 2002, with an emphasis on primary sources.
Subscribe to our Mailing List
Follow beSpacific
Searchable Database – Over 45,000 Postings

Searchable database of over 45,000 postings!
Awards for BeSpacific

American Bar Association

BeSpacific: “No one better has her finger on the pulse of the legal information world than Sabrina Pacifici, law librarian and author of the blog BeSpacific,” writes blogger Robert Ambrogi. “Launched in 2002, BeSpacific is one of the longest-running legal blogs and, remarkably, Sabrina seems more prolific today than ever. She posts multiple items every day, covering the gamut of law, technology and knowledge discovery and topics ranging from cybersecurity to legal research to government regulation to civil liberties to IP and more. For me, BeSpacific is one of my daily must-reads and has been for 14 years straight.”

Expert Institute Award for Best Legal Tech Blog 2016, 2017 and 2018
BeSpacific - 3rd Place
Subjects

Pages
LLRX

Sabrina is also the solo Editor, Publisher and Founder of LLRX.com® – Legal, technology and knowledge discovery resources on the “moving edge” for Librarians, Lawyers, Researchers, Academic and Public Interest Communities – launched in 1996.
Archives – 2002 to Present
Archives – 2002 to Present
Calendar

April 2024

M T W T F S S

« Mar

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

22 23 24 25 26 27 28

29 30