Going Beyond Informed Consent: The Power of Participatory Research

Depression and anxiety are two of the most common mental health challenges facing youth and young adults around the world. The COVID-19 pandemic has only deepened this crisis: globally, 1 in 4 young people are experiencing clinically elevated depression symptoms, while 1 in 5 are experiencing clinically elevated anxiety symptoms — double of pre-pandemic levels.

It is clear that we must accelerate research efforts to understand, prevent, and treat mental illness in young people. The good news is that our scientific toolbox is broader today than it has ever been thanks to rapid advances in digital technologies that have transformed how researchers conduct their work.

Harnessing digital technologies, we now have unprecedented abilities to collect real-world personal health data at scale, generating huge data sets that can be mined for insight into causes and potential remedies. This vast wellspring of information is already powering research insights that transform disease prevention and treatment across a variety of health domains.

But the digital collection of health data, and how we use it for scientific purposes, is anything but a neutral good. Like any tool, it carries the potential to help or to harm. This is an especially important concern when researchers gather and work with data from populations who have historically experienced harm from scientific and medical communities. It is incumbent on researchers to pursue our work in ways that improve the health of these communities and do not unintentionally create further damage as we seek to learn from the personal health data they can provide.

At Sage Bionetworks, our mission to accelerate the translation of science into medicine is grounded in advancing more collaborative and transparent research practices. We recently had a chance to put our beliefs and expertise into action while leading the MindKind Study, which pushed our collective understanding of the importance of participatory research and the good data governance approaches that support its success.

What is MindKind?

MindKind is a feasibility study, commissioned by the Wellcome Trust, to explore the technical, ethical, and regulatory challenges of collecting, storing, and sharing mental health data provided by youth with lived experience of mental health challenges using their smartphones.

Sage Bionetworks led a multinational team of researchers, clinicians, technologists, and young people from India, South Africa, and the UK to prototype and test how to build a global mental health databank that could house electronically-derived data from youth with a focus on the approaches, treatments, and interventions potentially relevant to anxiety and depression in 16–24 year-olds.

A participatory approach

Sage Bionetworks took a participatory approach to MindKind, bringing young people and adult researchers together as equal colleagues and partners throughout the study. Side by side, we planned the research, overcame obstacles in its rollout, conducted engagement, and analyzed our findings.

Participatory research is sometimes framed as “nothing about us, without us,” a concept brought into the scientific mainstream by disability rights activists. The implications of shifting traditional power hierarchies to include research participants as co-creators of the research process and outcomes go beyond improving the ethics of data collection.

A participatory research approach is absolutely critical to improve the merit of data itself — alongside increasing the efficiency of the scientific process and accelerating discoveries from it. In return, participants have a fair and ethical opportunity to shape and guide how the information they provide is accessed and utilized, while meaningfully contributing as informed citizen scientists to medical progress. In MindKind, youth participants were involved in every stage of exploring the tensions between preserving the privacy of their contributed data and opening that data to the greatest variety of solvers.

Sage Bionetworks has been working on ways to scale and diversify research participation since 2014, initially focusing on eConsent (electronic informed consent) as a method to increase engagement. Our collaboration with the National Institutes of Health (NIH) on the All of Us Research Program was a watershed moment for increasing the accessibility and transparency of informed consent processes through user-centered design. MindKind was a giant step forward for user-centered design in digital health research, placing communities in the driver seat for determining how their data is collected, stored, shared, and used.

Learning from MindKind participants

A key component of the MindKind study design was deliberative democracy sessions. Deliberative democracy is a research method that involves convening groups of “lay” people, educating them about a topic — often complex issues at the intersection of health research and technology — and working through systematic questions to build consensus around the group’s preferences. These meetings of youth participants delved deeply into questions of data governance. The goal of these sessions was to explore youth participants’ feelings and experiences with sharing personal data, with a focus on mental health data and data governance preferences.

In these discussions, young people demonstrated a sophisticated understanding of the costs and benefits of sharing personal mental health data with researchers, balancing personal privacy concerns with the benefits of an open science approach that could help their peers.

We learned from youth participants that how data was handled was less important to them than the trust they had in who would be collecting and sharing the data. Essentially, trustworthy relationships tipped the scales for them towards sharing their data for research purposes.


Mental health science — and scientific research into biomedical issues in general — has the potential to improve quality of life for millions of people. But researchers need better tools and better information to help find new ways to support young people struggling with anxiety, depression, and other mental health conditions. Those tools can be significantly improved by taking a participatory approach — and studies like MindKind prove that this is possible.

At Sage Bionetworks, we have deep experience in building cloud-based platforms that enable cross-disciplinary teams to share knowledge and collaborate more effectively when using large-scale biomedical data. Through a project like MindKind, we saw that both researchers and participants can and should contribute to building these systems in a meaningful way.

This post originally appeared on Medium.com on Sept. 6, 2022.

Three Principles for Collaborative Benchmarking Challenges

By Jiaxin Zheng and James Eddy 

At Sage Bionetworks, we strive to speed the translation of science to medicine by embracing open practices. 

Benchmarking reliable methods is one of the ways we deliver on that mission. Through our work with DREAM Challenges, we’ve pioneered the development of infrastructure and tools to objectively evaluate algorithms across a broad spectrum of biomedical domains, including bioinformatics, biomedical informatics, and predictive modeling of clinical outcomes. With algorithms playing an increasing role in biomedical analysis, crowd-sourced perspectives can shape more objective method evaluation and mitigate the self-assessment bias.

As the Challenge platform provider for the RSNA-ASNR-MICCAI Brats 2021 Segmentation Task, here are some of the principles our technology embraces to empower the benchmarking ecosystem. 

Collaboration: Synapse, Sage’s open-source research platform that allows teams to share data and track analyses, provided a centralized workspace for Challenge participants to collaborate. In addition to being able to access data, participants could post questions, find potential teammates, and submit models. From the wiki page to the evaluation workflow, we partnered with organizers to customize the space to meet their needs.


Portability: Sage developed the model-to-data approach where containerized algorithms rather than predictions are submitted for assessment on hidden data. These containers will be made available after the Challenge, promoting scientific reproducibility and reusability for the broader BraTS community. The model-to-data approach also eliminates the requirement for direct dissemination of validation data, reducing data transfer costs and enhancing security for Challenge organizers.


Innovation: To best serve the changes in the dynamic imaging space, we have augmented our infrastructure to integrate graphical processing unit (GPU) capabilities. These efforts enable easier exploration of large complex datasets and quicker model training, streamlining both development and evaluation. Our new GPU capability will be used for future imaging data inference competitions, and help stimulate algorithm development at the cutting edge of image-based learning. 

The BraTS community combined with the RSNA, ASNR, and MICCAI research networks has led to an impressive global turnout of Challenge participants, with more than 1,200 submissions from five continents. But this is only the beginning. Future phases of the BraTS Challenge will provide a sustainable cloud-based platform for open and continuous benchmarking of image analysis tools. We also plan to integrate with the DREAM Challenges community of solvers, and include genomics data in addition to images and new challenge tasks to address questions related to both.

Task 1 of the BraTS-RSNA-ASNR-MICCAI 2021 Challenge is the result of a collaboration by Sage Bionetworks, Perelman School of Medicine at the University of Pennsylvania, Radiological Society of North America, American Society of Neuroradiology, the Medical Image Computing and Computer Assisted Intervention Society, and sponsorship by Intel, RSNA, and Neosoma. We look forward to continued dialogue on how we can guide future algorithm development in order to best serve the broader biomedical community. 

Introducing NLPSandbox.io

By Jiaxin Zheng and Thomas Schaffter 

Natural language processing, or NLP, is a technology used in many ways to help computers understand human language. This is particularly impactful in biomedical research, where hospitals have millions of unstructured notes they need to de-identify before sharing with researchers. Manually de-identifying them would put significant strain on healthcare systems, presenting an excellent use case for the application of  NLP.

There are two key challenges  that NLP developers currently face. One is the lack of access to biomedical data on which to test the performance of their models. Given the size and sensitivity of the data, critical patient information is typically off limits for traditional model development. Another hurdle  is a lack of frameworks for assessing performance and generalizability. NLPSandbox.io can help on both fronts.

NLPSandbox.io is one of the first tool-benchmarking platforms that securely connects developers to healthcare data providers. The platform streamlines your development process and the assessment of tools that are re-usable, reproducible, portable and cloud-ready. The NLP Sandbox adopts the model-to-data architecture to enable NLP developers to assess the performance of their tools on public and private datasets. When a developer submits a tool, data partners automatically download the tool and evaluate its performance against their private data. This architecture enables our partners to fully control their data and ensure no sensitive information leaves their secure environment.

In addition to overcoming data access hurdles, NLP Sandbox also provides a competitive framework for assessing the performance of various NLP tasks. The first series of NLP Sandbox tasks supported by the NLP Sandbox are the annotation and de-identification of protected health information (PHI) in clinical notes. With Medical College of Wisconsin onboarded as our first data provider, developers can benchmark their de-identification tools on clinical notes.  Additional data from Mayo Clinic and University of Washington will soon follow, enabling developers to evaluate the generalizability of their tool’s performance across multiple datasets.

De-identification of PHI is only one of many tasks that NLP Sandbox will support in the future. We are also partnering with Mayo Clinic to enable the community to benchmark tools that automatically extract information about COVID-19 symptoms from clinical notes. We welcome suggestions for other NLP tasks, especially from partners who can provide data to support these tasks.

To get started, please check out NLPSandbox.io where you will find data schema, GitHub repositories, and a link to our Tuesday Discord office hours. If you are a data provider and would like to contribute,  please reach out at team@nlpsandbox.io. Lastly, we will also give a live introduction of the service later this month. Register here to hold your spot.

NLP Sandbox is the result of a collaboration by Sage Bionetworks, CD2H, NCATS, MCW, and Mayo Clinic. We hope you will join our growing list of collaborators, and look forward to building and innovating with you.

Applied ELSI research at Sage

Sage Bionetworks is known for its innovative approaches to collaborative computational biomedical research. Ten years ago, we started by “just” asking researchers to share their datasets more openly. Quickly, we banded together with like-minded “research parasites” to help grow and formalize the open collaboration for human health. We began to think creatively about how to get more minds on more data, partnering with DREAM Challenges and other innovative cooperative-solving endeavors.

A group of four people posing for the camera. Three are sitting on chairs and one is standing, but leaning over. They are all behind a table/desk.Along the way, we have also been doing innovative applied ELSI research. For example, Sage has been working on eConsent (electronic informed consent) as a way to scale and diversify research participation since 2014. Consistent with our open science ethos, we began our work by consulting with a diverse group of experts: ethicists, technologists, scientists, patient advocates, clinical data specialists, and clinicians. Building from their varied insights, we developed a normative description, laying out our case for what eConsent ought to be: how it should look, how it should work, and why. With the invaluable collaboration of external ethics review boards, we built eConsents based on that research (the “applied” bit), including for several of the first ResearchKit apps. And we’ve studied our work in eConsent empirically, most recently, through a mixed methods study of the effectiveness of one of the largest Sage-informed eConsent implementations: the All of Us Research Program. Using an applied ELSI approach has allowed us to rapidly design, build, and iterate on a novel approach to informed consent.

In 2019, as we approached our 10th anniversary, we took a look back at all of our work. We found that although our approaches to solving are growing ever more creative, and our data sets are becoming richer and more representative (hurrah for eConsent!!), the folks involved in doing open science weren’t necessarily reflecting those gains in diversity. In short, we were falling short. So, we asked ourselves: how can Sage help (re)build the biomedical research ecosystem so that it looks more like the communities it is meant to serve?

One step forward is through exploring questions of community-informed and community-led data governance. Data governance addresses questions like how are data accessed and who gets to control that access. Again, we have taken an applied ELSI approach to address these questions. We completed important conceptual (i.e., problem framing) work in our 2020-2022 strategic planning process. We also did some critical normative research describing data governance design patterns and highlighting specific ethical considerations for novel data use models. And, we completed a qualitative content analysis of key stakeholder feedback about community engagement in big data use.

Now we have a tremendous opportunity for a large-scale investigation of community-driven data governance through the Global Mental Health Databank, a partnership with the Mental Health Priority Area of Wellcome Trust. Using both qualitative and quantitative methods, we will empirically evaluate both acceptability and preference for novel data governance structures that give real voice to those banking their data. This will allow us to build data governance systems that balance community data sharing preferences with the open science ambitions of researchers.

We are enthusiastic about the innovation our applied ELSI approach is enabling in how personal data is collected, managed, and used for biomedical research. We invite you to join us as we work to design, build, and refine representative, fair, and inclusive data governance practices: Share your ideas here or send us a note through Twitter (@SageBio or @MegDoerr). We know better science happens when we work together.

More on ELSI research:

The field of ELSI (ethical, legal, and social implication) research grew out of the Human Genome Project in the 1990s, and has rapidly expanded to include disciplines beyond human genetics. Drawing together scholars from many existing disciplines, ELSI researchers employ both empirical and non-empirical methods of inquiry, exploring not only “what is” but also “what ought to be” – describing values and meaning in biomedical research practice. Empirical approaches used in ELSI research include qualitative, quantitative, and mixed methods. Non-empirical methods include conceptual (questions of meaning) and normative (questions of value) methods (see here for a great review). 

Expanding the Community

Adapted from Ben Logsdon

By Meg Doerr and Lara Mangravite

Since our founding, Sage Bionetworks has worked to expand the community of solvers using biomedical data to advance human health. In our 2020-2022 strategic plan, we committed to broadening the circle further by increasing representation across the research life cycle.

To this end, Sage has undertaken a number of scientific-allyship projects. In January 2020, we released a white paper describing our inclusive framework for qualifying solvers from around the world for access to biomedical data within Synapse, Sage’s data sharing and collaboration platform. In July 2020, we published the results of the first phase of DeCLR (Developing Pathways for Community-Led Research with Big Data) — a collaborative research project with Joon-Ho Yu (University of Washington) to develop a framework for meaningfully empowering communities in big-data analysis and interpretation.

Further, we recently announced a partnership with the Wellcome Trust’s Mental Health Priority Area to design, build, and assess community-driven governance structures for a databank of mental health data indicators from youth around the world. We are writing today to share some of our early learnings from the design of the quantitative arm of that study. In this arm, youth will be recruited to contribute both passive and survey data via a study app to a global databank. As they join, our plan is that participants will be randomized to different data governance models. But what models should we test?

Building from our previous normative research data governance design patterns, we narrowed the choices to more open data sharing models. But a host of options remained. We engaged panels of youth advisors and data-usability experts from around the globe to help us with the experimental design. We broke down the key data governance questions into a typology for their discussion:

Figure by Carly Marten/Sage Bionetworks

Interestingly, we found broad agreement between youth advisors and data usability experts on four of the seven data governance questions.

  • Who can access the data? Nearly all agreed that open data access was ideal, with many youth highlighting data access as a social justice issue.
  • Can the data be seen in an easy-to-use format? In keeping with broad and equitable access, there was strong consensus that there should be some sort of a data browser or query engine to facilitate data exploration by a wide audience.
  • What do people have to do before they can access the data? There was a strong desire among youth and ethics-oriented data usability advisors for an interlocking system of identity verification, targeted ethics training, and contracting as mechanisms to guard against “bad guys.” Youth were opposed to researchers having to pay a fee for data access, citing fees as a way of reinforcing global inequity in data access and use.
  • Who takes on the cost of managing the data? Speaking of money and inequity, advisors highlighted that funders should be prepared to shoulder the bulk of the cost for data hosting. They did allow that some cost sharing with researchers might be okay — for example in the case of commercial use of data.

The three questions on which there was disagreement highlight where we need to turn our attention for the quantitative study arm’s experimental design.

  • How can people see the data? Youth strongly preferred “sandboxing” data in a secure server or releasing only a synthetic data set (i.e., model-to-data), with not a single youth advisor advocating for data download. Meanwhile, the vast majority of data usability experts strongly preferred allowing data download.
  • Who controls the data? While everyone agreed someone should control the data, there was a lot of great discussion on who exactly that should be. Advisors were quick to highlight the benefits and drawbacks of each model. For example, if the “community decides,” who is the community?
  • What kind of research can people do with the data? There was no debate on requiring researchers to follow research ethics guidelines (phew!). However, the youth advisors thought there should be some additional restrictions on data use, for example restricting profit making from the data.

We are thrilled to have this opportunity for empirical exploration of community-informed and community-lead data governance. We are taking these learnings and building out a study design to test these points of disagreement. We look forward to sharing our findings with you in the months to come!

The Global Mental Health Databank – Practicing Better Science Together

At Sage, we build responsible practices for data sharing in health research. By combining policy and technology, we work to ensure data can be safely used across institutes. We do this because broad data resources and interdisciplinary teams are necessary to understand the complexities of human health, and we need to use this kind of information to improve people’s experiences in the health system.

We have been successful in stimulating collaborative science across teams of researchers. But we recognize that our approaches to data sharing haven’t included everyone who should be part of the conversation. Patient advocates and community-based organizations have asked us for many years why we don’t prioritize participant involvement. Indeed, there are a wide range of research opportunities that are only possible with direct involvement of the individuals contributing the data. We haven’t had a good answer.

Researchers simply can’t keep excluding those who provide the data. While we at Sage are partial to the ethical arguments for this, there are plainly scientific arguments as well. As data collection creeps out of the lab and into our homes, researchers will not recruit and engage diverse cohorts if the people in the study are treated as subjects, not partners.

In our own work we focus on “real-world evidence” – how to collect, govern, and analyze a wide range of extremely personal data about our everyday lives, including medical care, daily habits, self-management practices and lived experience. What is the value proposition that would convince anyone to contribute these kinds of data to research? How broadly are people willing distribute data collected about their daily lives? How do these tradeoffs look to people who aren’t traditionally asked their opinion about tradeoffs?

The challenge then is to create a data governance system that empowers people to be active partners in managing the way that their data is collected, shared, or used in research.

This is why I am so excited about the new partnership we’ve just formed with Miranda Wolpert and the Mental Health Priority Area at Wellcome Trust. We’re going to build – and test – that data governance system.

The Global Mental Health Databank project seeks to research strategies – active ingredients – that youth around the world can use to self-manage anxiety and depression, to develop a system that guides youth to those strategies that they are most likely to find useful. To be clear, our project is to design the governance for this kind of databank. We’ll be testing out the ways that participants want to govern such a databank – we are not conducting the research on mental health!

To be clear, our project is to design the governance for this kind of databank.

Doing this will require active partnership with both researchers and youth to help collect the data needed to answer the question of “what works for whom and why” for mental health management. A data sharing system designed for these purposes must meet the needs both of youth with lived experience in mental health and of mental health researchers. Participants and researchers are often interested in different questions – both of which have value. Just as researcher-led programs have often overlooked participant needs and interests, participant-led programs may miss subtleties of statistics and biology that are important to researchers.

Over the next two years, we will work together with youth and researchers to evaluate the feasibility of successfully implementing a participant-led databank for global mental health research that enables the collection, sharing and use of data from youth across three countries – South Africa, India, and the United Kingdom. We will run this study as an experiment in participant-led data governance designed to address several key questions: What value do youth find in participating in this kind of databank and how does that vary across individuals? Is youth involvement in research impacted by their control over how their data is collected, shared, and used? What levels of oversight do they wish to have? Do these considerations have an impact on what types of data they are willing to contribute? What do they wish to do with these data and what support do they need to achieve their goals? How do the preferences of youth and the preferences of researchers intersect?

I am thankful to be collaborating with a team of amazing researchers with deep expertise in youth mental health: Drs. Zuki Zingela and Melvyn Freeman in South Africa, Dr. Soumitra Pathare in India, and Dr. Tamsin Ford and Dr. Mina Fazel in the UK. These individuals work directly with school-based youth on the management of mental health. Together, we will evaluate the interests of youth in engaging with a databank program of this sort. We are also joined by two research teams from the University of Washington, Drs. Pamela Collins and Pat Areán who have expertise in global mental health collaboratives and in digital mental health assessment, respectively.

Our team is committed to embarking on this journey in partnership with youth. To build a system that meets the needs of youth and researchers, we need both perspectives to be involved right from the beginning. To this end, our first action has been to hire young adults onto the team to co-develop this work with these researchers. They will be supported by a series of panels of youth with lived experience of mental health in each country who can provide a diverse set of perspectives to inform the project.

Stay tuned!

Bringing Structure and Design to Data Governance

Before COVID-19 took over the world, the Governance team at Sage Bionetworks had started working on an analysis of data governance structures and systems to be published as a “green paper” in late 2020. Today we’re happy to publicly release that paper, Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report.

In the paper, we provide a landscape analysis of models of governance for open data sharing based on our observations in the biomedical sciences. We offer an overview of those observations and show areas where we think this work can expand to supply further support for open data sharing outside the sciences.

The central argument of this paper is that the “right” system of governance is determined by first understanding the nature of the collaborative activities intended. These activities map to types of governance structures, which in turn can be built out of standardized parts — what we call governance design patterns. In this way, governance for data science can be easy to build, follow key laws and ethics regimes, and enable innovative models of collaboration. We provide an initial survey of structures and design patterns, as well as examples of how we leverage this approach to rapidly build out ethics-centered governance in biomedical research.

While there is no one-size-fits-all solution, we argue for learning from ongoing data science collaborations and building on from existing standards and tools. And in so doing, we argue for data governance as a discipline worthy of expertise, attention, standards, and innovation.

We chose to call this report a “green paper” in recognition of its maturity and coverage: it’s a snapshot of our data governance ecosystem in biomedical research, not the world of all data governance, and the entire field of data governance is in its infancy. We have licensed the paper under CC-BY 4.0 and published it in github via Manubot in hopes that the broader data governance community might fill in holes we left, correct mistakes we made, add references and toolkits and reference implementations, and generally treat this as a framework for talking about how we share data.

Sage Perspective: Retention in Remote Digital Health Studies

Editor’s note: This is a Twitter thread from John Wilbanks, Sage’s chief commons officer.


New from Abishek Pratap and a few more of us – Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants

A few thoughts on the paper:

  1. Hurrah for data that’s open enough to cross-compare.
  2. When someone shows you overall enrollment in a digital health study, ask about engagement % on day 2. It’s a way better metric.
  3. Over-recruit the under-represented with intent from the start or your sample won’t be anywhere close to diverse enough.
  4. Design your studies for broad, shallow engagement – your protocol and analytics will be better matched.
  5. Pay for participation and clinician involvement make a huge difference. Follow @hollylynchez who writes very clearly on the payment topic.
  6. Clinician engagement is going to need some COI norms because whew it’s easy to see where that can go sideways.
  7. When your study is flattened down to an app on a screen, the competition is savage for attention and you’ll get deleted really quickly if there isn’t some sense of value emerging from the study.
  8. Meta-conclusion: perhaps start with the question: how does this give value the participant when the app is in airplane mode?
  9. On “pay to participate” – the first time I ever talked to @FearLoathingBTX, he immediately foresaw studies providing a “free” phone for participation, but cutting service off for low engagement. That is, sadly, definitely on track absent some intervention.

Related content and resources:


Democratizing data access

Open Data Sharing in the 21st Century: Sage Bionetworks’ Qualified Research Program and Its Application in mHealth Data Release

As a leading advocate for open science practices, informed consent, and data privacy in biomedical research, Sage actively pilots and tests innovative tools and resources to maximize the scientific value derived from datasets while still ensuring basic contractual protections for research participants. This paper details the rationale, features, and application of Sage’s novel framework for qualifying a diverse pool of solvers from around the world for accessing health and biomedical data within Synapse. The three conceptual mechanisms guiding the development of the framework—transactional cost, exposure, and openness— are identified and illustrated via the case example of mPower, the first study to pilot the qualified researcher framework in 2015. This paper concludes with a cross-sectional snapshot of the current pool of qualified researchers and reveals key challenges and future directions for optimizing the qualified researcher framework in the years to come.  

Read white paper…