This project promotes the progress of science and technology development by providing the empirical knowledge needed to advance fair, just computational research. Big, pervasive data about people enables fundamentally new computational research, but also raises new ethical challenges, such as accounting for distributed harms at scale, protecting against the risks of unpredictable future uses of data, and ensuring fairness in automated decision-making. National debates have erupted over online experiments, leaked datasets, and the definition of "public" data. Investigators struggle to advise students on engaging vulnerable populations or navigating terms of service. Regulators debate how to translate traditional ethical principles into workable policy guidance. Research addressing these challenges has hit roadblocks caused by a lack of empirical knowledge about emerging norms and expectations. This project discovers how diverse stakeholders - big data researchers, platforms, regulators, and user communities - understand their ethical obligations and choices, and how their decisions impact data system design and use. It also compares stakeholder perspectives against the risks and realities of pervasive data itself, answering fundamental questions about the fairness and ethics of such research. Understanding how computing researchers adapt their practices in the big data era, and highlighting points of convergence or conflict with data realities, user expectations, and regulatory practices, will produce concrete guidance for pervasive data ethics. In addition to improving ethical approaches for studying people in computing contexts, this work empowers researchers with actionable information about emergent norms and risks. Outputs, such as decision-support tools, guidance on measuring risk, public educational material and bibliographies, and reusable empirical data, are designed to support the wide range of stakeholders in data ethics. To meet these goals, this project enables a collaboratory - a virtual center combining data and analytical resources - to collect empirical data on research ethics at diverse scopes and scales. The research includes including attention to multiple ethical issues (privacy, risk, respect, beneficence, justice) as well as the full network of stakeholders involved in research ethics (user communities, computing research communities, technical platforms, and regulations). The project conducts interviews with, and surveys of, 1) user communities, 2) computing researchers, 3) data ethics regulators, and 4) commercial platform providers. The project also gathers numerous shared document sets, including 1) pervasive data research publications, 2) pervasive computing curricula and degree requirements, 3) news articles and public discourse about pervasive data research, 4) a corpus of existing data ethics training, 5) pervasive data grant summaries and data management plans, and 6) corporate ethics guidelines and regulatory documents. The project uses these resources to: discover metrics for assessing and moderating risks to data subjects; document how user attitudes and media reactions shape subjects' willingness to participate in pervasive data research; model user concerns in ways accessible to computational researchers; discover how existing ethical codes can be adapted and adopted for the real-world working conditions of sociotechnical and cyber-human research; determine how the changing practices of academic and corporate regulators impact users and researchers; and illuminate implementable and sustainable best practices for research ethics.
|