Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

You Are Here: Home » AI, E-Commerce, Internet, Knowledge Management » MIT study finds labelling errors in datasets used to test AI

MIT study finds labelling errors in datasets used to test AI

by Sabrina I. Pacifici on Mar 28, 2021

engadget: “A team led by computer scientists from MIT examined ten of the most-cited datasets used to test machine learning systems. They found that around 3.4 percent of the data was inaccurate or mislabeled, which could cause problems in AI systems that use these datasets. The datasets, which have each been cited more than 100,000 times, include text-based ones from newsgroups, Amazon and IMDb. Errors emerged from issues like Amazon product reviews being mislabeled as positive when they were actually negative and vice versa. Some of the image-based errors result from mixing up animal species. Others arose from mislabeling photos with less-prominent objects (“water bottle” instead of the mountain bike it’s attached to, for instance)…One of the datasets centers around audio from YouTube videos. A clip of a YouTuber talking to the camera for three and a half minutes was labeled as “church bell,” even though one could only be heard in the last 30 seconds or so. Another error emerged from a misclassification of a Bruce Springsteen performance as an orchestra…”

Facebook Tweet LinkedIn

Sorry, comments are closed for this post.

Support beSpacific

Research updates provided daily since 2002, with an emphasis on primary sources.
Subscribe to our Mailing List
Follow beSpacific
Searchable Database – Over 45,000 Postings

Searchable database of over 45,000 postings!
Awards for BeSpacific

American Bar Association

BeSpacific: “No one better has her finger on the pulse of the legal information world than Sabrina Pacifici, law librarian and author of the blog BeSpacific,” writes blogger Robert Ambrogi. “Launched in 2002, BeSpacific is one of the longest-running legal blogs and, remarkably, Sabrina seems more prolific today than ever. She posts multiple items every day, covering the gamut of law, technology and knowledge discovery and topics ranging from cybersecurity to legal research to government regulation to civil liberties to IP and more. For me, BeSpacific is one of my daily must-reads and has been for 14 years straight.”

Expert Institute Award for Best Legal Tech Blog 2016, 2017 and 2018
BeSpacific - 3rd Place
Subjects

Pages
LLRX

Sabrina is also the solo Editor, Publisher and Founder of LLRX.com® – Legal, technology and knowledge discovery resources on the “moving edge” for Librarians, Lawyers, Researchers, Academic and Public Interest Communities – launched in 1996.
Archives – 2002 to Present
Archives – 2002 to Present
Calendar

January 2025

M T W T F S S

« Dec

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31