Data Scientist, Cancer Systems Biology

Position Overview

Sage Bionetworks is recruiting a Data Scientist to support our open science initiatives in computational oncology. Sage is a non-profit research organization in Seattle, WA that seeks to develop predictors of disease and accelerate health research through the creation of open systems, incentives, and standards. We create strategies and platforms that empower researchers to share and interpret data on a colossal scale, crowdsource tests for new hypotheses, and contribute to knowledge through community challenges. At the base of all of these efforts is the need to create globally coherent biological data sets that are Findable, Accessible, Interoperable, and Re-usable.

We are seeking someone to join in the development of an open knowledge portal in oncology who understands the complexities of working with dozens of scientific laboratories to combine and manage heterogeneous biological data sources, including the difficulties of standardizing metadata, and the value that standardization can provide to support reproducible research.

Specific Responsibilities Include:

  • Working with a team of bioinformaticians, statisticians, and software developers at Sage to coordinate and manage resource and data sharing efforts among a large consortium of cancer researchers
  • Leading a team of scientists from collaborating biomedical research labs to identify or develop standards and protocols for storing, describing, and sharing heterogeneous data (including clinical, genomic, and imaging datasets) and tools
  • Creating tools to use or build on existing APIs to manage data, implement standards, apply metadata, and enable facile upload of curated data to publicly funded repositories
  • Interacting with collaborators to identify data management needs, respond to requests for data and programmatic support, generate reports, and provide training
  • Communicating research findings including organizing web sites and the design/creation of dashboards, visualizations, and other resources for experimental and clinical biologists to explore and leverage data and analysis results

Required Qualifications

  • PhD computer science, bioinformatics or MS in information and/or library sciences with 2+ years working in research data management in the biological sciences
  • Experience with R and/or Python programming, particularly with data visualization libraries
  • Experience managing data and computing within distributed compute environments and interacting with SQL-like databases

Desired Qualifications

  • Experience with genomic data and familiarity with collaborative development and version control systems (e.g., git)



Current Positions

Computational Oncology

The Computational Biology group focuses on developing integrative probabilistic models for prediction of disease phenotypes and validating of hypotheses generated by novel methodologies. Currently opportunities include: positions in Oncology focused on conducting original research in analyzing large-scale high dimensional genomics data to develop predictive models of cancer phenotypes. Positions in collaboration with the recently merged Sage/DREAM effort, focused on designing and implementing crowd-sourced collaborative challenges around cancer phenotype prediction problems. Positions in stem cell bioinformatics with a focus on development of the data and analysis bioinformatics portal for the Progenitor Cell Biology Consortium, as well as research projects on modeling molecular mechanisms underlying stem cell differentiation.

Mobile Health

Sage Bionetworks’ mobile health (mHealth) program is designed to improve disease characterization through the use of sensor-based technologies and bi-directional feedback to improve health monitoring and provide quantitative metrics to assess disease impact on health and on quality of life. We maximize the insights gained from these efforts by providing them through Synapse, a collaborative compute platform. Our mHealth team includes expertise in software engineering (both iOS and Android), clinical study design and development, data governance and data analysis. We are actively involved in projects across a range of disease areas and within the Precision Medicine Initiative.

Systems Biology

The Systems Biology research group at Sage Bionetworks is working to understand the underlying mechanisms causal to common disease. We use large-scale genomic analysis to identify disease subclasses, generate diagnostic and prognostic biomarkers, and to identify pathophysiology causal to disease in collaboration with academic and industry partners. Our current portfolio is focused on neurobiology, spanning both neurodegenerative and neuropsychiatric disorders, and includes projects in other disease areas including immunology, metabolic disease and craniofacial deformation.

Technology Platforms & Services

We’re working on the tools and platforms required to gather, share and use biomedical data in novel ways. These are targeted both at the research community, as well as organizations and individuals who are involved in providing data and being involved in the research process. They range from the technology platforms Synapse and BRIDGE, through novel methods of addressing governance issues around the distribution of human data such as E-Consent, to the ability to run Challenges to solve data-driven questions through our partnership with DREAM.