Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Data Governance in Open Source AI: Enabling Responsible and Systemic Access

“As the Open Source Initiative convened its process to define Open Source AI, it became clear that organizations that care for open, fair and public-interest AI need to pay particular attention to and establish a shared position on data sharing and data governance. Open Source Artificial Intelligence (AI) development presents an opportunity to democratize technological progress and reduce the concentration of power in the AI industry. However, its success depends heavily on the availability of high-quality, diverse datasets and robust data governance frameworks. This white paper explores the intersection of data governance and Open Source AI, emphasizing the importance of responsible data sharing, community stewardship, and equitable practices in fostering innovation while protecting fundamental rights. The broader challenges in data governance Data is a critical resource for AI systems, yet its use is fraught with challenges. We face a paradox when it comes to data availability. On the one hand, data is abundant — best demonstrated by the fact that the entire open web’s content is foundational to most generative models developed in recent years. On the other hand, it is scarce, as evidenced by those same models, for which access to proprietary, restricted data provides an advantage. Publicly available datasets, such as those derived from web scraping, have historically supported AI advancements, but they also raise ethical concerns about privacy, consent, and data ownership. While vast amounts of data are accessible, much of it is proprietary, poorly curated, or unrepresentative of global diversity. In this context, Open Source is the ideal way to create equitable and transparent AI systems. The Open Source Initiative (OSI) spearheaded efforts to understand openness for AI through the Open Source AI Definition (OSAID). However, the OSAID process revealed that more focus is needed on data governance, addressing the ethical and legal complexities of data sharing…”

Sorry, comments are closed for this post.