By Meg Doerr and Lara Mangravite
Since our founding, Sage Bionetworks has worked to expand the community of solvers using biomedical data to advance human health. In our 2020-2022 strategic plan, we committed to broadening the circle further by increasing representation across the research life cycle.
To this end, Sage has undertaken a number of scientific-allyship projects. In January 2020, we released a white paper describing our inclusive framework for qualifying solvers from around the world for access to biomedical data within Synapse, Sage’s data sharing and collaboration platform. In July 2020, we published the results of the first phase of DeCLR (Developing Pathways for Community-Led Research with Big Data) — a collaborative research project with Joon-Ho Yu (University of Washington) to develop a framework for meaningfully empowering communities in big-data analysis and interpretation.
Further, we recently announced a partnership with the Wellcome Trust’s Mental Health Priority Area to design, build, and assess community-driven governance structures for a databank of mental health data indicators from youth around the world. We are writing today to share some of our early learnings from the design of the quantitative arm of that study. In this arm, youth will be recruited to contribute both passive and survey data via a study app to a global databank. As they join, our plan is that participants will be randomized to different data governance models. But what models should we test?
Building from our previous normative research data governance design patterns, we narrowed the choices to more open data sharing models. But a host of options remained. We engaged panels of youth advisors and data-usability experts from around the globe to help us with the experimental design. We broke down the key data governance questions into a typology for their discussion:
Interestingly, we found broad agreement between youth advisors and data usability experts on four of the seven data governance questions.
- Who can access the data? Nearly all agreed that open data access was ideal, with many youth highlighting data access as a social justice issue.
- Can the data be seen in an easy-to-use format? In keeping with broad and equitable access, there was strong consensus that there should be some sort of a data browser or query engine to facilitate data exploration by a wide audience.
- What do people have to do before they can access the data? There was a strong desire among youth and ethics-oriented data usability advisors for an interlocking system of identity verification, targeted ethics training, and contracting as mechanisms to guard against “bad guys.” Youth were opposed to researchers having to pay a fee for data access, citing fees as a way of reinforcing global inequity in data access and use.
- Who takes on the cost of managing the data? Speaking of money and inequity, advisors highlighted that funders should be prepared to shoulder the bulk of the cost for data hosting. They did allow that some cost sharing with researchers might be okay — for example in the case of commercial use of data.
The three questions on which there was disagreement highlight where we need to turn our attention for the quantitative study arm’s experimental design.
- How can people see the data? Youth strongly preferred “sandboxing” data in a secure server or releasing only a synthetic data set (i.e., model-to-data), with not a single youth advisor advocating for data download. Meanwhile, the vast majority of data usability experts strongly preferred allowing data download.
- Who controls the data? While everyone agreed someone should control the data, there was a lot of great discussion on who exactly that should be. Advisors were quick to highlight the benefits and drawbacks of each model. For example, if the “community decides,” who is the community?
- What kind of research can people do with the data? There was no debate on requiring researchers to follow research ethics guidelines (phew!). However, the youth advisors thought there should be some additional restrictions on data use, for example restricting profit making from the data.
We are thrilled to have this opportunity for empirical exploration of community-informed and community-lead data governance. We are taking these learnings and building out a study design to test these points of disagreement. We look forward to sharing our findings with you in the months to come!