Responsible Data Collection

A foundation for responsible innovation and digital equality

Responsible data collection is critical in today’s interconnected, digital world.  Access to public web data is the cornerstone of innovation, scientific research, commercial enterprise, political freedom, governmental accountability, digital equality, and more. The Alliance for Responsible Data Collection (ARDC) is at the forefront of advancing responsible web data collection practices that provide a robust framework of data collection guidelines emphasizing transparency, accountability, and fairness. By fostering collaboration between and across industries, non-profits, and academia, ARDC champions a culture of responsible web scraping, web crawling, and data mining  that safeguards open access to public internet data while maintaining compliance with applicable laws and promoting industry best practices for fair, ethical, and responsible data collection.

While automated data collection has existed for decades, the recent use of web scraped data to create AI training models has raised public awareness and generated concerns around the collection and use of public web data.  As laws and regulations governing web data scraping evolve, organizations that engage in or rely upon automated data collection seek guidance not only to maintain compliance with applicable laws but to ensure the methods used to collect data protect the integrity of digital ecosystems and align with industry best practices.  ARDC provides a trusted and adaptable framework to respond to these concerns and adapt to a changing legal landscape.

Through its technical standards and governance guidelines, ARDC ensures that organizations commit to its essential principles of practice, which include limiting automated data collection to only publicly available data, adopting acceptable use policies that commit to compliance with applicable laws and describe additional restrictions beyond legal requirements, monitoring domain health and implementing rate limitations as needed, documenting their data collection policies and processes (including whether and to what extent each data collection abides by a domain’s robots.txt files), retaining query logs and other information to provide greater visibility into data provenance and collection methods for downstream users, and implementing mechanisms to address abuse or misuse.  The ARDC standards and guidelines reflect a common language and cross-industry standard for responsible data practices. 

In addition to responsible data collection practices, ARDC advocates for the democratization of public web data. By enabling organizations of all sizes to access this valuable resource, ARDC supports the growth  of small and medium businesses, the development of  local economies, advancements  in public education and political discourse, scientific and historical research, and accountability of public agents and actors. Whether through extraction and analysis of commercial data, archiving of publicly available websites, or the development and deployment of  AI training models, the tangible benefits of responsible data collection enable a growing economy, protect individual rights, and empower innovation..

As conversations surrounding web scraping, web crawling, and data mining continue to evolve, ARDC remains a trusted voice in shaping these discussions. Through its commitment to responsible data collection, ARDC is helping organizations navigate the complexities of data governance and downstream use models that build on automated data extraction. By adhering to ARDC’s guidelines, data collectors can confidently leverage the transformative power of web data while adhering to industry best practices and data users can confidently acquire data collected by ARDC members knowing that it has been responsibly sourced. 

 

Join ARDC