Applied ELSI research at Sage

Sage Bionetworks is known for its innovative approaches to collaborative computational biomedical research. Ten years ago, we started by “just” asking researchers to share their datasets more openly. Quickly, we banded together with like-minded “research parasites” to help grow and formalize the open collaboration for human health. We began to think creatively about how to get more minds on more data, partnering with DREAM Challenges and other innovative cooperative-solving endeavors.

A group of four people posing for the camera. Three are sitting on chairs and one is standing, but leaning over. They are all behind a table/desk.Along the way, we have also been doing innovative applied ELSI research. For example, Sage has been working on eConsent (electronic informed consent) as a way to scale and diversify research participation since 2014. Consistent with our open science ethos, we began our work by consulting with a diverse group of experts: ethicists, technologists, scientists, patient advocates, clinical data specialists, and clinicians. Building from their varied insights, we developed a normative description, laying out our case for what eConsent ought to be: how it should look, how it should work, and why. With the invaluable collaboration of external ethics review boards, we built eConsents based on that research (the “applied” bit), including for several of the first ResearchKit apps. And we’ve studied our work in eConsent empirically, most recently, through a mixed methods study of the effectiveness of one of the largest Sage-informed eConsent implementations: the All of Us Research Program. Using an applied ELSI approach has allowed us to rapidly design, build, and iterate on a novel approach to informed consent.

In 2019, as we approached our 10th anniversary, we took a look back at all of our work. We found that although our approaches to solving are growing ever more creative, and our data sets are becoming richer and more representative (hurrah for eConsent!!), the folks involved in doing open science weren’t necessarily reflecting those gains in diversity. In short, we were falling short. So, we asked ourselves: how can Sage help (re)build the biomedical research ecosystem so that it looks more like the communities it is meant to serve?

One step forward is through exploring questions of community-informed and community-led data governance. Data governance addresses questions like how are data accessed and who gets to control that access. Again, we have taken an applied ELSI approach to address these questions. We completed important conceptual (i.e., problem framing) work in our 2020-2022 strategic planning process. We also did some critical normative research describing data governance design patterns and highlighting specific ethical considerations for novel data use models. And, we completed a qualitative content analysis of key stakeholder feedback about community engagement in big data use.

Now we have a tremendous opportunity for a large-scale investigation of community-driven data governance through the Global Mental Health Databank, a partnership with the Mental Health Priority Area of Wellcome Trust. Using both qualitative and quantitative methods, we will empirically evaluate both acceptability and preference for novel data governance structures that give real voice to those banking their data. This will allow us to build data governance systems that balance community data sharing preferences with the open science ambitions of researchers.

We are enthusiastic about the innovation our applied ELSI approach is enabling in how personal data is collected, managed, and used for biomedical research. We invite you to join us as we work to design, build, and refine representative, fair, and inclusive data governance practices: Share your ideas here or send us a note through Twitter (@SageBio or @MegDoerr). We know better science happens when we work together.

More on ELSI research:

The field of ELSI (ethical, legal, and social implication) research grew out of the Human Genome Project in the 1990s, and has rapidly expanded to include disciplines beyond human genetics. Drawing together scholars from many existing disciplines, ELSI researchers employ both empirical and non-empirical methods of inquiry, exploring not only “what is” but also “what ought to be” – describing values and meaning in biomedical research practice. Empirical approaches used in ELSI research include qualitative, quantitative, and mixed methods. Non-empirical methods include conceptual (questions of meaning) and normative (questions of value) methods (see here for a great review). 

Third edition of IRB reference manual released with new chapter on mHealth

Cover of IRB Management and Function textbook. The background image of the cover resembles a cross section of a colorful stone.The field of mHealth and wearables research is constantly evolving. How can IRBs keep up? In the latest edition of Internal Review Board Management and Function, Sage Bionetworks’ Meg Doerr and Sara Meeder, of Maimonides Medical Center, share their expertise in a new chapter called “mHealth, Mobile Technologies, and Apps.” Highlights in the chapter include:

  • Introduction and background to support reviewers new to the field of mHealth research
  • Guidance for reviewers evaluating mHealth studies on topics such as reliability of data, data breach procedures, third-party access and use of data, and considerations for enrollment of vulnerable populations.
  • Accessible discussion of the importance and impact of technical documentation, including privacy policies and terms of service, on regulatory compliance.

Internal Review Board Management and Function from PRIM&R (release date Feb. 22, 2021) is now available.

Expanding the Community

Adapted from Ben Logsdon

By Meg Doerr and Lara Mangravite

Since our founding, Sage Bionetworks has worked to expand the community of solvers using biomedical data to advance human health. In our 2020-2022 strategic plan, we committed to broadening the circle further by increasing representation across the research life cycle.

To this end, Sage has undertaken a number of scientific-allyship projects. In January 2020, we released a white paper describing our inclusive framework for qualifying solvers from around the world for access to biomedical data within Synapse, Sage’s data sharing and collaboration platform. In July 2020, we published the results of the first phase of DeCLR (Developing Pathways for Community-Led Research with Big Data) — a collaborative research project with Joon-Ho Yu (University of Washington) to develop a framework for meaningfully empowering communities in big-data analysis and interpretation.

Further, we recently announced a partnership with the Wellcome Trust’s Mental Health Priority Area to design, build, and assess community-driven governance structures for a databank of mental health data indicators from youth around the world. We are writing today to share some of our early learnings from the design of the quantitative arm of that study. In this arm, youth will be recruited to contribute both passive and survey data via a study app to a global databank. As they join, our plan is that participants will be randomized to different data governance models. But what models should we test?

Building from our previous normative research data governance design patterns, we narrowed the choices to more open data sharing models. But a host of options remained. We engaged panels of youth advisors and data-usability experts from around the globe to help us with the experimental design. We broke down the key data governance questions into a typology for their discussion:

Figure by Carly Marten/Sage Bionetworks

Interestingly, we found broad agreement between youth advisors and data usability experts on four of the seven data governance questions.

  • Who can access the data? Nearly all agreed that open data access was ideal, with many youth highlighting data access as a social justice issue.
  • Can the data be seen in an easy-to-use format? In keeping with broad and equitable access, there was strong consensus that there should be some sort of a data browser or query engine to facilitate data exploration by a wide audience.
  • What do people have to do before they can access the data? There was a strong desire among youth and ethics-oriented data usability advisors for an interlocking system of identity verification, targeted ethics training, and contracting as mechanisms to guard against “bad guys.” Youth were opposed to researchers having to pay a fee for data access, citing fees as a way of reinforcing global inequity in data access and use.
  • Who takes on the cost of managing the data? Speaking of money and inequity, advisors highlighted that funders should be prepared to shoulder the bulk of the cost for data hosting. They did allow that some cost sharing with researchers might be okay — for example in the case of commercial use of data.

The three questions on which there was disagreement highlight where we need to turn our attention for the quantitative study arm’s experimental design.

  • How can people see the data? Youth strongly preferred “sandboxing” data in a secure server or releasing only a synthetic data set (i.e., model-to-data), with not a single youth advisor advocating for data download. Meanwhile, the vast majority of data usability experts strongly preferred allowing data download.
  • Who controls the data? While everyone agreed someone should control the data, there was a lot of great discussion on who exactly that should be. Advisors were quick to highlight the benefits and drawbacks of each model. For example, if the “community decides,” who is the community?
  • What kind of research can people do with the data? There was no debate on requiring researchers to follow research ethics guidelines (phew!). However, the youth advisors thought there should be some additional restrictions on data use, for example restricting profit making from the data.

We are thrilled to have this opportunity for empirical exploration of community-informed and community-lead data governance. We are taking these learnings and building out a study design to test these points of disagreement. We look forward to sharing our findings with you in the months to come!

The Global Mental Health Databank – Practicing Better Science Together

At Sage, we build responsible practices for data sharing in health research. By combining policy and technology, we work to ensure data can be safely used across institutes. We do this because broad data resources and interdisciplinary teams are necessary to understand the complexities of human health, and we need to use this kind of information to improve people’s experiences in the health system.

We have been successful in stimulating collaborative science across teams of researchers. But we recognize that our approaches to data sharing haven’t included everyone who should be part of the conversation. Patient advocates and community-based organizations have asked us for many years why we don’t prioritize participant involvement. Indeed, there are a wide range of research opportunities that are only possible with direct involvement of the individuals contributing the data. We haven’t had a good answer.

Researchers simply can’t keep excluding those who provide the data. While we at Sage are partial to the ethical arguments for this, there are plainly scientific arguments as well. As data collection creeps out of the lab and into our homes, researchers will not recruit and engage diverse cohorts if the people in the study are treated as subjects, not partners.

In our own work we focus on “real-world evidence” – how to collect, govern, and analyze a wide range of extremely personal data about our everyday lives, including medical care, daily habits, self-management practices and lived experience. What is the value proposition that would convince anyone to contribute these kinds of data to research? How broadly are people willing distribute data collected about their daily lives? How do these tradeoffs look to people who aren’t traditionally asked their opinion about tradeoffs?

The challenge then is to create a data governance system that empowers people to be active partners in managing the way that their data is collected, shared, or used in research.

This is why I am so excited about the new partnership we’ve just formed with Miranda Wolpert and the Mental Health Priority Area at Wellcome Trust. We’re going to build – and test – that data governance system.

The Global Mental Health Databank project seeks to research strategies – active ingredients – that youth around the world can use to self-manage anxiety and depression, to develop a system that guides youth to those strategies that they are most likely to find useful. To be clear, our project is to design the governance for this kind of databank. We’ll be testing out the ways that participants want to govern such a databank – we are not conducting the research on mental health!

To be clear, our project is to design the governance for this kind of databank.

Doing this will require active partnership with both researchers and youth to help collect the data needed to answer the question of “what works for whom and why” for mental health management. A data sharing system designed for these purposes must meet the needs both of youth with lived experience in mental health and of mental health researchers. Participants and researchers are often interested in different questions – both of which have value. Just as researcher-led programs have often overlooked participant needs and interests, participant-led programs may miss subtleties of statistics and biology that are important to researchers.

Over the next two years, we will work together with youth and researchers to evaluate the feasibility of successfully implementing a participant-led databank for global mental health research that enables the collection, sharing and use of data from youth across three countries – South Africa, India, and the United Kingdom. We will run this study as an experiment in participant-led data governance designed to address several key questions: What value do youth find in participating in this kind of databank and how does that vary across individuals? Is youth involvement in research impacted by their control over how their data is collected, shared, and used? What levels of oversight do they wish to have? Do these considerations have an impact on what types of data they are willing to contribute? What do they wish to do with these data and what support do they need to achieve their goals? How do the preferences of youth and the preferences of researchers intersect?

I am thankful to be collaborating with a team of amazing researchers with deep expertise in youth mental health: Drs. Zuki Zingela and Melvyn Freeman in South Africa, Dr. Soumitra Pathare in India, and Dr. Tamsin Ford and Dr. Mina Fazel in the UK. These individuals work directly with school-based youth on the management of mental health. Together, we will evaluate the interests of youth in engaging with a databank program of this sort. We are also joined by two research teams from the University of Washington, Drs. Pamela Collins and Pat Areán who have expertise in global mental health collaboratives and in digital mental health assessment, respectively.

Our team is committed to embarking on this journey in partnership with youth. To build a system that meets the needs of youth and researchers, we need both perspectives to be involved right from the beginning. To this end, our first action has been to hire young adults onto the team to co-develop this work with these researchers. They will be supported by a series of panels of youth with lived experience of mental health in each country who can provide a diverse set of perspectives to inform the project.

Stay tuned!

Bringing Structure and Design to Data Governance

Before COVID-19 took over the world, the Governance team at Sage Bionetworks had started working on an analysis of data governance structures and systems to be published as a “green paper” in late 2020. Today we’re happy to publicly release that paper, Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report.

In the paper, we provide a landscape analysis of models of governance for open data sharing based on our observations in the biomedical sciences. We offer an overview of those observations and show areas where we think this work can expand to supply further support for open data sharing outside the sciences.

The central argument of this paper is that the “right” system of governance is determined by first understanding the nature of the collaborative activities intended. These activities map to types of governance structures, which in turn can be built out of standardized parts — what we call governance design patterns. In this way, governance for data science can be easy to build, follow key laws and ethics regimes, and enable innovative models of collaboration. We provide an initial survey of structures and design patterns, as well as examples of how we leverage this approach to rapidly build out ethics-centered governance in biomedical research.

While there is no one-size-fits-all solution, we argue for learning from ongoing data science collaborations and building on from existing standards and tools. And in so doing, we argue for data governance as a discipline worthy of expertise, attention, standards, and innovation.

We chose to call this report a “green paper” in recognition of its maturity and coverage: it’s a snapshot of our data governance ecosystem in biomedical research, not the world of all data governance, and the entire field of data governance is in its infancy. We have licensed the paper under CC-BY 4.0 and published it in github via Manubot in hopes that the broader data governance community might fill in holes we left, correct mistakes we made, add references and toolkits and reference implementations, and generally treat this as a framework for talking about how we share data.

National COVID Cohort Collaborative Data Enclave Launched

We are pleased to announce the launch of the National COVID Cohort Collaborative Data Enclave. From the NIH-NCATS announcement:

Researchers studying COVID-19 now are able to access an innovative new analytics platform that contains clinical data from the electronic health records of people who were tested for the novel coronavirus or who have had related symptoms. Part of the NCATS National COVID Cohort Collaborative (N3C) Data Enclave, the centralized and secure data platform features powerful analytics capabilities for online discovery, visualization and collaboration. The data are robust in scale and scope and are transformed into a harmonized data set to help scientists study COVID 19, including potential risk factors, protective factors and long-term health consequences.

Sage’s Justin Guinney (collaborative analytics), and John Wilbanks and Christine Suver (data partnership and governance) have served as workstream leads.

Some of the results so far include:

  • 58 data transfer agreements (DTAs) executed
  • 43 sites obtained IRB approval (local and sIRB)
  • 41 sites have both DTA executed & IRB approval (can begin data ingestion)
  • 35 sites have an executed DUA
  • 26 sites have deposited data in the N3C pipeline

This is a unique resource that represents a long held goal of data integration across the U.S. clinical translational research system. But, without data users, data commons don’t mean much. Please register and use the data.

Democratizing data access

Open Data Sharing in the 21st Century: Sage Bionetworks’ Qualified Research Program and Its Application in mHealth Data Release

As a leading advocate for open science practices, informed consent, and data privacy in biomedical research, Sage actively pilots and tests innovative tools and resources to maximize the scientific value derived from datasets while still ensuring basic contractual protections for research participants. This paper details the rationale, features, and application of Sage’s novel framework for qualifying a diverse pool of solvers from around the world for accessing health and biomedical data within Synapse. The three conceptual mechanisms guiding the development of the framework—transactional cost, exposure, and openness— are identified and illustrated via the case example of mPower, the first study to pilot the qualified researcher framework in 2015. This paper concludes with a cross-sectional snapshot of the current pool of qualified researchers and reveals key challenges and future directions for optimizing the qualified researcher framework in the years to come.  

Read white paper…

Trusting Tech Companies with Our DNA Data A ‘Tough Sell’

In a recent article posted on Bloomberg Law, John Wilbanks, chief commons officer at Sage Bionetworks, commented:

“People don’t trust tech right now…the idea that we’re suddenly going to trust them with all of our DNA, that’s going got be a tough sell.”

Wilbanks was speaking on the panel Using Patient Data in Research: Balancing Benefits and Risks during the Milken Institute’s Future of Health Summit 2019. The panel focused on the following topics:

We are capturing, tracking, and creating more data than ever about our lifestyle and health. More and more companies hold our data, which we sign away by indicating that we’ve read extensive privacy policies, when few of us actually have. And there are major gaps, loopholes, and complexities in the regulations that protect our data. Yet, when put to good use, these masses of data might help us manage our health, help providers understand the patient experience, and help researchers glean new insights about disease and biology. What risks exist when companies carry so much consumer data? How can good practice and regulations mitigate these risks? Our panel of experts will discuss these issues and lead a conversation about how we can shift towards an environment where health data is used to empower patients, while also being used to glean new insights for research and move the field forward.

Read the Bloomberg article…


VIDEOS: Watch Talks from Sage’s mHealth App Developers Workshop

The Mobile Health App Developer Workshop took place at the New York Genome Center on Sept. 12, 2019. Sage Bionetworks hosted the workshop, which featured a keynote from Andy Coravos, CEO of Elektra Labs. Workshop sessions were presented by Sage staff, including Meg Doerr, Vanessa Barone, Woody MacDuffie, and Abhishek Pratap.

Watch full playlist