Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

New AI standards group wants to make data scraping opt-in

Ars Technica: “The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the Internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to keep the source material flowing. The Dataset Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its stances on major AI-related issues. The alliance is made up of seven AI licensing companies, including music copyright-management firm Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. (At least five new members will be announced in the fall.) The DPA advocates for an opt-in system, meaning that data can be used only after consent is explicitly given by creators and rights holders. This represents a significant departure from the way most major AI companies operate. Some have developed their own opt-out systems, which put the burden on data owners to pull their work on a case-by-case basis. Others offer no opt-outs whatsoever. The DPA, which expects members to adhere to its opt-in rule, sees that route as the far more ethical one. “Artists and creators should be on board,” says Alex Bestall, CEO of Rightsify and the music-data-licensing company Global Copyright Exchange, who spearheaded the effort. Bestall sees opt-in as a pragmatic approach as well as a moral one: “Selling publicly available datasets is one way to get sued and have no credibility.”

Sorry, comments are closed for this post.