Consent in Crisis: The Rapid Decline of the AI Data Commons. July 2024. Data ProvenanceL General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. “To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web… Continue Reading