A Response to the Request for Information on the American Research Environment Issued by the Office of Science and Technology Policy
// Submitted by Lara Mangravite and John Wilbanks on Behalf of Sage Bionetworks
Innovations in digital technology expand the means by which researchers can collect data, create algorithms, and make scientific inferences. The potential benefit is enormous: we can develop scientific knowledge more quickly and precisely. But there are risks. These new capabilities do not, by themselves, create reliable scientific insights; researchers can easily run afoul of data rights, misuse data and algorithms, and get lost in a sea of potential resources; and the larger scientific community can barricade themselves into silos of particular interests.
Improving discovery and innovation though the sharing of data and code requires new forms of practice, refined through real world experience. Science is a practice of collective sense-making, and updates to our tools demand updates to our sense-making practices. At Sage Bionetworks, we believe that these practices are a part of the American Research Environment. As such, our response to this Request for Information (RFI) focuses on implementing scientific practices that promote responsible resource sharing and the objective, independent evaluation of research claims.
We begin with two vignettes that illustrate the power of open science practices to deal with Alzheimer’s Disease and colorectal cancer. Next, we assess the American Research Environment, given our aims: more (and more responsible) sharing; data and algorithms that can be trusted; and evidence collection that is practical and ethical.
Finally, we offer recommendations under the Research Rigor and Integrity section of the RFI. Our conclusion is that to improve digital practices across the scientific community, we must explicitly support transdisciplinary practices as important efforts in their own right, while integrating them into domain-specific scientific projects.
Summary of recommendations
- Develop — and fund over time — platforms for storing, organizing, and making discoverable a wide variety of types of data to a wide variety of
- Change the institutional incentives towards using cloud platforms over local high- performance computing (HPC).
- Develop clear community standards for implementing, evaluating, and articulating algorithm
- Create or acquire training, workshops, or other forms of education on the ethics of computational
- Develop systemic practices for identifying risks, potential harms, benefits, and other key elements of conducting studies with data from mobile
- Require federal researchers to preregister their research prior to conducting work (e.g., via clinicaltrials.gov) to ensure their results are published, even if their hypotheses are not
Research — including its reproduction — can be a complex, systems-of-systems phenomenon. Incentives, impediments, and opportunities exist at multiple interacting layers. It is often helpful to understand issues such as these in context. The following two examples show how technology-centered collaborative practices can yield stronger scientific claims, which in turn increase returns on investment in science.
Accelerating Medicines Partnership for Alzheimer’s Disease
Alzheimer’s disease (AD) and dementia are a public health crisis. The financial toll of dementia is already staggering. In the U.S. alone, the costs of caring for people over age 70 with dementia were estimated to be as high as $215 billion in 2010. Drugs for dementia are hard to find, such that the cost of finding even an ineffective medicine for AD sits at $5.7 billion.
The question is, what can we do to make it easier? One way is to change our scientific practice – the way we discover drugs and their targets. The Accelerating Medicines Partnership for Alzheimer’s Disease is a test of this idea. Twelve companies joined the National Institutes of Health (NIH) in this pre- competitive collaboration, forming a community of scientists that use the cloud to work together, share early and often, and improve both public and private returns on investments in AD drug discovery.
Within AMP-AD, Sage Bionetworks coordinates the Target Discovery and Preclinical Validation project. The project’s goal is to shorten the time between discovery of potential drug targets to development of new drugs for Alzheimer’s treatment and prevention. It brings together six multi-institution academic teams, four industry partners, and four non-profit organizations. The project tests the use of Artificial Intelligence/Machine Learning (AI/ML) analysis on high-dimensional human brain data to identify AD drug targets. Because these methods were untested, these AMP-AD groups work together to identify effective research methods — and outcomes. In this way, expert groups take independent approaches at solving this problem and then collectively identify repeatable observations. This requires early sharing of data, methods, and results. All the scientists operate inside Synapse, a Sage-built cloud platform with services that document the data science process. Using Synapse makes data and code widely reusable, with quarterly data releases to the public. Another Sage portal, Agora, allows any researcher to explore curated genomic analyses and target nominations from AMP-AD and associated consortia.
AMP-AD has already paid off. Over five years, AMP identified over 500 new drug targets for Alzheimer’s disease for under $100 million. The next phase is already underway, with Alzheimer Centers for the Discovery of New Medicines set to diversify and reinvigorate the Alzheimer’s disease drug development pipeline at a cost of just $73 million.
Colorectal Subtyping Consortium
Colorectal cancer (CRC) is a frequently lethal disease with complex, mixed outcomes and drug responses. In the early 2010s, a number of independent groups reported different genetic “subtypes” for CRC — these subtypes were designed to help doctors understand how different kinds of colorectal cancer will respond to different drugs.
Subtyping is harder than it needs to be because different researchers and labs process data differently, use different data to create their algorithms, and more. Even the way researchers convert tumor samples into digital data affects the process. So, to actually benefit patients, the colorectal cancer research community needed to bring it all together and compare notes.
The Colorectal Cancer Subtyping Consortium (CRCSC) was formed to identify a consensus among the divergent scientific results through large scale data sharing and meta-analysis. The CRCSC began with 6 academic groups from 15+ institutions. It collected and analyzed more than 30 patient groups with gene expression data, spanning multiple platforms and sample preparation methods. Each of the 6 AI/ML models was applied to the collection of public and proprietary datasets encompassing over 4,000 samples, mostly stage II-III cancer. An independent team centrally assessed the accuracy of subtype calls and associations with clinical, molecular and pathway features. Compared to how long it would take for each research team to publish a peer reviewed paper, read the papers of the other teams, and conduct additional research, this process produced results at an incredible rate.
Despite significant diversity in patients studied and AI/ML methods, the consortium came to a clear consensus on 4 CRC molecular subtypes (CMS1-4), with significant interconnectivity among the work from the participating groups. This was the first example of a large-scale, community-based comparison of cancer subtypes, and we consider the outcome the most robust way to classify colorectal cancer for targeted drugs based on genetics. It is the kind of work that typically can take a decade or more to reach consensus in the field through publication and conferences – whereas our approach led to publication of the consensus model within three years of the first of the divergent papers being published. Furthermore, our aim was to establish an important scientific practice for collaborative, community-based cancer subtyping that will facilitate the translation of molecular subtypes into the clinic.
Assessment of the American Research Environment
Medical progress is hindered by many challenges. Consider the fact that health conditions are often defined – imprecisely – by symptoms rather than by biology, or that disease onset and treatment responses vary across populations, or our inability to effectively tailor care to the needs of individuals. Advances in information technology have provided us with an opportunity to address limitations such as these. Over the past two decades, new tools have emerged to collect, share, and combine data of many different types and scales, as have the algorithms to process them to uncover new knowledge in a wide variety of domains. The growing power, affordability, and ubiquity of computational tools in biomedical science has made them an indispensable component of the research environment.
Yet computational discovery has suffered from the same failures of translation and reproducibility that have plagued traditional approaches to discovery. We have new tools to generate and process vast quantities of information, but we often lack validated practices to turn that information into reliable insights. We need methodologies, processes, and baseline data that reliably and reproducibly generate trustable knowledge out of large-scale data. The AMP-AD and CRC vignettes above demonstrate how this can reduce the cost and the time of creating the reliable scientific insights on which treatments are based.
Unfortunately, there are market failures and public value failures around new scientific practices. Most incentives instead lead towards data withholding, black-box algorithms, and force reliable knowledge to emerge over artificially long time periods. Businesses fund research that results in private, appropriable intellectual property; they tend not to fund work with results that anyone can use, including the meta-work on how data science affects research reliability. Even when research is publicly funded, the individuals and institutions conducting it have the incentive to bolster their reputations by keeping data and code to themselves. The scientific funding, publishing, and promotion systems prefer papers claiming insights over methods development, and original research over replication. These perverse incentives prevent the scientific community from sharing effectively across organizations to perform effective computational research. They make it more likely that innovation will create value for research producers than for patients.
Open science practices can address these market failures and public value failures. As we saw in the AMP-AD example, the secret to the lower cost and higher throughput is the implementation of collaborative practices, mediated through a scientific research software platform. The transparency, replication, and reuse of data and code can be increased by an evolving set of rules and cultural norms to promote appropriate interpretation of data and code and to speed information flow. These practices are essential for rigorous science, given the tools we have at our disposal and the unique complexities that have been introduced by computational research.
Sharing Research Data
Over the past 10 years, the scale and scope of data used for biomedical research has expanded. We have observed an explosion in community-based data sharing practices to allow independent teams across institutions to access each other’s data, to generate federated data resources that combine databases, and to integrate large-scale, multi- modal data — including many from non-traditional sources, such as electronic health records and real-world data streams from increasingly pervasive smart devices. There is a great opportunity to improve the quality, reproducibility, and replicability of research by making these practices widely known, and these data resources interoperable. As was shown in the CRC vignette above, large scale data sharing and meta-analysis across more than 15 institutions yielded extraordinary results in a fraction of the time of a publication-mediated process. Science progresses faster, farther, and more surely though the wisdom of crowds – including crowds of researchers connected by technology.
However, there are impediments to realizing these benefits: data scale, data rights, and data omission. These impediments are magnified when science occurs across organizational boundaries, i.e. between federal agencies, universities, and private firms. The sheer size and diversity of data sets can limit their effective use. Also impeding use are the complexities of data protection; proprietary and/or sensitive data (e.g., patient clinical records) are only allowed to exist on certain networks — for good reasons like protecting privacy or preventing harm, they’re out of reach for those on other networks.
Finally, data that are not codified in any system in the first place cannot be shared; those who collect data do not always publish all of the data they collect, which can distort scientific claims through a perceived absence of evidence.
To overcome these limitations, and mitigate the costs of overcoming them, two approaches have emerged. In the sandbox approach, data are secured in a private environment to which only qualified researchers gain access. Analysis happens inside the sandbox, so that data cannot be misused externally. In the model-to-data approach, qualified researchers may send algorithms to be run in protected environments with data that they cannot access, which can allow for crowd-based access to data that is itself never exposed. Increasingly, the field is also considering federated extensions to these sharing models for situations where data must remain under the external control of data contributors. These types of solutions balance collaboration with the needs of various parties to control resources.
Just as there are potential pitfalls of sharing data, so too are there potential pitfalls for sharing the code used to build quantitative models. In typical practice, algorithm evaluations are conducted by those who developed them. Thus, most developers fall into the self-assessment trap, such that their algorithms outperform others at a rate that suggests that all methods are better than average. This can be inadvertent — a result of information leaks from, or over-fitting to, the data at hand — or it can be intentional — a result of selective reporting, where authors choose the metric or the data in which their algorithm shines, but hide those metrics and data that show sub-par performance.
The risks from using the wrong algorithm at the wrong time can be more arcane to the casual observer than the risks of bad data, but they are every bit as significant.
Algorithms make predictions, and the self-assessment trap means a lot of those predictions will be wrong. Making the wrong predictions can cost scientists – and the taxpayer who funds them – years of misdirected research. If we don’t have a standard way to decide if an algorithm is above, at, or below average, we won’t even know how to start when faced with a new one. We believe that the self-assessment trap is a major block for algorithms that hope to translate into actually helping patients. We therefore need frameworks inside the research environment that can separate the algorithm’s developer from its evaluator – to see if it works the way it’s supposed to work.
Using Real World Evidence
Digital devices make it very easy to collect data from a vastly larger group of people than was possible before. This can blur the line between traditional research study and consumer data collection methods. Real world evidence (RWE) is data that are collected out in the wild, and their collection will increasingly be driven by mobile devices and sensors. Much RWE will indeed come from devices that people own – bought in the context of consumer technology, not regulated research.
But consumer tools prioritize adoption. They use one-click buttons to obtain consent, and don’t worry about bioethics concepts like autonomy, respect for persons, beneficence. Compared to consumer devices and apps, ethical collection of RWE will require slowness and attention from both researchers and potential participants. This may hurt raw enrollment numbers compared to consumer technology, which creates temptation to abandon bioethics in favor of consumer surveillance approaches.
Our research environment needs to acknowledge this reality: we need consumer technology to collect RWE, but consumer technology is often legally and ethically contracted at odds with ethical research protections. Few stakeholders in the space build ethical, practical governance for RWE as a result. The increasing availability of RWE thus creates the need for new research ethics protections for the digital surveillance era.
Different organizations across different sectors have different strengths, and open science practices should help them make the most of their strengths individually, and collectively. Some organizations have the resources that others do not. Some have a comparative advantage in producing quality data and code, while others have an advantage in access to facilities and equipment. Some organizations have fast networks with ample storage, while others have to budget their computing resources more strictly. Some organizations are moving towards an open approach from closed approaches, while others are moving there from very (possibly irresponsibly) open approaches.
Given the complexity of biomedical data sharing across the biomedical field, and the different starting points of different organizations, we require a flexible spectrum of open science approaches.
As such, there are no one-size-fits-all recommendations. Each organization and research domain must be addressed as a unique case. However, given the incentives, impediments, and opportunities described above, we offer the following general recommendations in response to questions 1, 2, 3, and 4 in the “Research Rigor and Integrity” section of the Request for Information.
Q1. What actions can Federal agencies take to facilitate the reproducibility, replicability, and quality of research? What incentives currently exist to (1) conduct and report research so that it can be reproduced, replicated, or generalized more readily, and (2) reproduce and replicate or otherwise confirm or generalize publicly reported research findings?
Develop — and fund over time — platforms for storing, organizing, and making discoverable a wide variety of types of data to a wide variety of stakeholders. For example, Synapse and Agora (highlighted in the AMP-AD vignette above), allow researchers to share data, evaluate hypotheses, and make collective decisions about research directions. These sharing platforms should support efficient and responsible data sharing through integrated approaches for data governance, data management, and data access. They should be able to accommodate large numbers of users, adapt to heterogeneous and complex data types and compute environments, and incentivize wider participation in a data and benchmarking ecosystem. Finally, they should be designed to capitalize on the power of cognitive diversity resident in the American research environment by drawing upon the perspectives and experiences of the researchers who will use them, and upon the lessons of the emerging science of team science.
Change the institutional incentives toward using cloud platforms over local high-performance computing (HPC). Many institutions have built local HPC high performance computing resources over time. These resources support scientists locally but can serve as a disincentive for researchers to move into cloud platforms that facilitate collaboration and reuse of data. Funding should shift from supporting local HPC to supporting standard cloud platforms, and specific funds — separate from research grants — should be dedicated to support public clouds run as utilities in addition to supporting research computing on corporate clouds at Amazon, Google, and so on. Public cloud utilities would act as a nimble form of market regulator, keeping prices low and creating user-friendly features that might not line up with corporate revenue maximization.
Q2. How can Federal agencies best work with the academic community, professional societies, and the private sector to enhance research quality, reproducibility, and replicability? What are current impediments and how can institutions, other stakeholders, and Federal agencies collaboratively address them?
Develop clear community standards for implementing, evaluating, and articulating algorithm benchmarks. An emerging paradigm for the development and unbiased assessment of tools and algorithms is crowd-sourced challenge-based benchmarking. By distributing problems to large communities of expert volunteers, complex questions can be addressed efficiently and quickly, while incentivizing adoption of new standards. Challenges provide a successful resolution to the “self- assessment trap” through robust and objective benchmarks. Moreover, a successful challenge model can be an effective way for motivating research teams to solve complex problems.
Q3. How do we ensure that researchers, including students, are aware of the ethical principles of integrity that are fundamental to research?
Create or acquire training, workshops, or other forms of education on the ethics of computational science. Computational biomedicine will only improve human health when conducted in a reliable and responsible manner. It is, therefore, critical to establish and implement community norms for responsible data sharing and reliable computational data analysis. Training and workshops can help instill in researchers the knowledge — and the conscience — needed to effectively and ethically navigate the evolving landscape of computational science. Educational modules should cover topics including: 1) efficient and responsible methods for sharing of biomedical research data; 2) Industry standards for objective benchmarking of algorithms used to derive insight from evaluate that data; and 3) the reliable and responsible integration of real-world evidence (RWE) — from electronic health records and smart devices — into research programs.
Develop systemic practices for identifying risks, harms, benefits, and other key elements of conducting studies with data from mobile devices. This necessarily involves understanding how to design clinical protocols, informed consent, and data sharing processes for anything from low risk surveys up to full genomes and biospecimens. It could also involve developing a methodology that borrows from software development, including version control, analytic dashboards, user experience design, and more to support efficiency increases in protocol management.
Q4. What incentives can Federal agencies provide to encourage reporting of null or negative research findings? How can agencies best work with publishers to facilitate reporting of null or negative results and refutations, constraints on reporting experimental methods, failure to fully report caveats and limitations of published research, and other issues that compromise reproducibility and replicability?
Require federal researchers to preregister their research prior to conducting work (e.g., via clinicaltrials.gov) to ensure their results are published, even if their hypotheses are not validated. If 9 out of 10 studies do not validate a hypothesis, but the only one that does gets published, then the scientific community will have an inaccurate record of evidence to substantiate a claim. Moreover, what are negative results for the hypotheses of the researcher initiating the study may be positive results for the hypotheses of other researchers in the community.
Thank you for the opportunity to provide our perspective on how to improve the American research environment. We believe that open computational science practices can vastly improve the speed and efficacy of the research enterprise and must be applied responsibly. Furthermore, to improve digital practices across the scientific community, we must explicitly support these transdisciplinary practices as important efforts in their own right, while integrating them into domain-specific scientific projects. They should not be ancillary efforts, tagged onto research primarily aimed at particular discoveries.
In this response, we focused on the present state of the enterprise, but it is also helpful to consider the future. The growth and trajectory of AI and machine learning guarantee that new challenges and possibilities with sharing data and code will emerge as time passes. The assessment and recommendations offered here address the impediments and opportunities we currently face, but they also set us up to avoid the worst consequence of increasingly powerful information and knowledge technology, and set us up to more aptly seize the chances they provide.