Pacific Symposium Biocomputing Workshop
Establishing the Reliability of Algorithms in Biomedical Research
As rich data streams are accumulating across people and time, they provide a powerful opportunity to address limitations in our existing scientific knowledge and to overcome operational challenges in healthcare and life sciences. Yet the relative weighting of insights vs. methodologies in our current research ecosystem tends to skew the computational community away from algorithm evaluation and operationalization, resulting in well-reported proliferation of scientific outcomes of unknown reliability. Algorithm selection and use is hindered by several problems that persist across our field. One is the impact of the self-assessment bias, which can lead to mis-representations in the accuracy of research results. A second challenge is the impact of data context on algorithm performance. Biology and medicine are dynamic and heterogeneous. Data is collected under varying conditions. For algorithms, this means that performance is not universal — and should be evaluated across a range of conditions. These issues are increasingly difficult as algorithms are trained and used on data collected outside of the research setting – where data collection is not well controlled and data access may be limited by privacy or proprietary reasons. This workshop will focus on approaches that are emerging across the researcher community to quantify the accuracy of algorithms and the reliability of their outputs.
Despite intensive efforts to utilize this data to optimize healthcare, relatively few methods have been validated and clinically deployed. The reasons for this include technical, scientific, social and business related. On the technical side this includes inaccessibility of gold-standard datasets for robust validation, heterogeneity in data collected from distributed sources, contextual relevance of biological observations across samples, poor algorithmic reproducibility and community-acceptance of biased approaches for assessing methods. Reproducibility and transparency are two methods which support development of reliable biomedical claims that can both generate new knowledge and apply it to advance health care. Although these approaches have become firmly established and increasingly practiced over the past decade, they do not fully address the question of reliability in biomedical research findings remains. This session will discuss general methods for open community-based methods to benchmark algorithms, including the use of crowd-sourced challenges as a tool for the unbiased assessment of tools and algorithms.
Organizers and Keynote Speakers
Lara Mangravite, Sage Bionetworks
Lara Mangravite is the President at Sage Bionetworks. Her work focuses on the development, evaluation and dissemination of methods to support large-scale collaborative biomedical research. Her work is centered on new approaches to scientific processes that use open systems to enable community-based research regarding complex biomedical problems. Dr. Mangravite also serves as a director for the DREAM challenges organization. Her research group at Sage Bionetworks focuses on responsible data sharing and community-based analytical programs to advance drug discovery and biomarker development in neurological research. Dr. Mangravite obtained a BS in Physics from the Pennsylvania State University and a PhD in Pharmaceutical Chemistry from the University of California, San Francisco. She completed a postdoctoral fellowship in cardiovascular pharmacogenomics at the Children’s Hospital Oakland Research Institute.
Sean Mooney, University of Washington – KEYNOTE SPEAKER
Sean Mooney is the Chief Research Information Officer (CRIO) of UW Medicine and a Professor in the Department of Biomedical Informatics and Medical Education at the University of Washington. As CRIO, he leads the growing Research Information Technology team and provides strategic vision to the development of new platforms that leverage large clinical datasets. Previous to his CRIO role, he was an Associate Professor and Director of Bioinformatics at the Buck Institute for Research on Aging in Northern California. His group is known for managing the development of informatic tools for supporting biomedical research. His research interests focus on data science applications in biomedicine, particularly in understanding the underlying molecular causes of inherited genetic diseases and cancer. As an Assistant Professor, he was appointed in Medical and Molecular Genetics at Indiana University School of Medicine and founder and director of the Indiana University School of Medicine Bioinformatics Core. In 1997, he received his B.S. with Distinction in Biochemistry and Molecular Biology from the University of Wisconsin at Madison. He then received his Ph.D. in 2001 at the University of California in San Francisco, and was later an American Cancer Society John Peter Hoffman Fellowship at Stanford University.
Iddo Friedberg, Iowa State University – KEYNOTE SPEAKER
Iddo Friedberg is an Associate Professor at Iowa State University, and the Chair of the PhD program in Bioinformatics and Computational Biology. Since 2010 he has been co-organizing the Critical Assessment of Function Annotation computational challenge. He has also published studies on biases in functional annotation, and on crowdsourcing for phenomics. His interests lie in protein function prediction, assessment of prediction performance, antimicrobial resistance surveillance, and the study of biases in algorithms and biomedical datasets.
Justin Guinney, Sage Bionetworks
Justin Guinney is the Vice President of the Computational Oncology group at Sage Bionetworks. He also serves as the executive director for the DREAM challenges organization. His group contains specialists from multiple domains, including molecular biology, computer science, and oncology, and focuses on the development of computational models for optimizing patient diagnosis, prognosis, and treatment in cancer. Dr. Guinney is an expert at large-scale analysis of genomic data, and works regularly with clinicians to link these models to complex cancer phenotypes. Prior to joining Sage Bionetworks, he co-founded and managed a software company called FiveSight Technologies, now part of Intalio Corp. Dr. Guinney received a BA from the University of Pennsylvania in History, a BS from the University of Illinois, Urbana-Champaign in Electrical Engineering, and a PhD from Duke University in Computational Biology and Bioinformatics.
Ellrott, K. et al. Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges. Genome Biol. 20, 195 (2019).
Bender, E. Challenges: Crowdsourced solutions. Nature 533, S62–4 (2016).
Saez-Rodriguez, J. et al. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nat. Rev. Genet. 17, 470–486 (2016).
EHR DREAM Challenge. synapse.org/ehr_dream_challenge_mortality doi:10.7303/syn18405991.
Kahn, M. G. et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC) 4, 1244 (2016).
Beaulieu-Jones, B. K. et al. Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing. Circ. Cardiovasc. Qual. Outcomes 12, e005122 (2019).
Chen, J., Chun, D., Patel, M., Chiang, E. & James, J. The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures. BMC Med. Inform. Decis. Mak. 19, 44 (2019).
Modernizing the Food and Drug Administration’s Data Strategy; Public Meeting; Request for Comments. https://www.federalregister.gov/documents/2020/01/08/2020-00071/modernizing-the-food-and-drug-administrations-data-strategy-public-meeting-request-for-comments
Marbach D. et. Al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012 Aug; 9(8): 796–804.
Trister, A. D., Buist, D. S. M. & Lee, C. I. Will Machine Learning Tip the Balance in Breast Cancer Screening? JAMA Oncol (2017) doi:10.1001/jamaoncol.2017.0473.
Keller, A. et al. Predicting human olfactory perception from chemical features of odor molecules. Science 355, 820–826 (2017).