Community Driven Data Collection and Consent in AI

Published: 21 February 2024
on channel: Mozilla Developer
116
3

Generative AI in 2024 has a consent problem. Scraped and otherwise stolen datasets are used to produce output that can directly compete with the people who generated the source data. This doesn’t have to be our future. The Common Voice project collects volunteer donated speech data to freely offer academics, industry and language activists a future where meaningful linguistic diversity is built into the digital products and services that increasingly fill our world. By teaching computers the way that real people speak, Common Voice doesn’t just offer a better connected future for global users, but presents us with one possible consent led model for community driven data collection. Together, let’s explore how community led dataset collection, design and governance structures have developed across speech datasets and look at how freely donated data, data trusts and other consent led collection models could offer a less dystopian AI future. An exploratory look at the proliferation of consent led data collection models in speech datasets, looking not only at Common Voice's CC0 donation-led approach but also looking into how data collection and governance models that offer more granular data control (like language community led data trusts) could offer AI and all of us touched by AI a less dystopian path into the future.


Watch video Community Driven Data Collection and Consent in AI online without registration, duration hours minute second in high quality. This video was added by user Mozilla Developer 21 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 11 once and liked it people.