Mozilla updates Common Voice dataset with 1,400 hours of speech across 18 languages

by Sabrina I. Pacifici on Feb 28, 2019

VentureBeat: “Mozilla wants to make it easier for startups, researchers, and hobbyists to build voice-enabled apps, services, and devices. Toward that end, it’s today releasing the latest version of Common Voice, its open source collection of transcribed voice data that now comprises over 1,400 hours of voice samples from 42,000 contributors across 18 languages, including English, French, German, Dutch, Hakha-Chin, Esperanto, Farsi, Basque, Spanish, Mandarin Chinese, Welsh, and Kabyle. It’s one of the largest multi-language dataset of its kind, Mozilla claims — substantially larger than the Common Voice corpus it made publicly available eight months ago, which contained 500 hours (400,000 recordings) from 20,000 volunteers in English — and the corpus will soon grow larger still. The organization says that data collection efforts in 70 languages are actively underway via the Common Voice website and mobile apps…”

Facebook Tweet LinkedIn

Sorry, comments are closed for this post.

Support beSpacific

Research updates provided daily since 2002, with an emphasis on primary sources.
Subscribe to our Mailing List
Follow beSpacific
Searchable Database – Over 45,000 Postings

Searchable database of over 45,000 postings!
Awards for BeSpacific

American Bar Association

BeSpacific: “No one better has her finger on the pulse of the legal information world than Sabrina Pacifici, law librarian and author of the blog BeSpacific,” writes blogger Robert Ambrogi. “Launched in 2002, BeSpacific is one of the longest-running legal blogs and, remarkably, Sabrina seems more prolific today than ever. She posts multiple items every day, covering the gamut of law, technology and knowledge discovery and topics ranging from cybersecurity to legal research to government regulation to civil liberties to IP and more. For me, BeSpacific is one of my daily must-reads and has been for 14 years straight.”

Expert Institute Award for Best Legal Tech Blog 2016, 2017 and 2018
BeSpacific - 3rd Place
Subjects

Pages
LLRX

Sabrina is also the solo Editor, Publisher and Founder of LLRX.com® – Legal, technology and knowledge discovery resources on the “moving edge” for Librarians, Lawyers, Researchers, Academic and Public Interest Communities – launched in 1996.
Archives – 2002 to Present
Archives – 2002 to Present
Calendar

January 2025

M T W T F S S

« Dec

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

January 2025
M	T	W	T	F	S	S
« Dec
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31