Research Scientist, Computational Oncology

Research Scientist, Computational Oncology

Sage Bionetworks is currently recruiting a research scientists with a strong background in statistical modeling, machine learning, and data analysis, particularly as applied to large-scale biological and/or cancer genomics data sets. We are active in a number of research areas including immuno-oncology, spatial-temporal tumor heterogeneity, and drug response which we study using state-of-the-art molecular, imaging, and clinical modalities. The research scientist will contribute to the DREAM Challenges – a community supporting data challenge – as well as large-scale consortiums including the Human Tumor Atlas Network, the Cancer Systems Biology Consortium, and the Physical Sciences Oncology Network.


• Conduct data QC, data harmonization, and data analysis on large-scale genomic data.
• Create workflows to facilitate collaboration across consortiums (e.g., Dockerize computational methods).
• Manage data access in support of consortiums (e.g., data transfer, meta-data management, and QC on Sage’s data sharing platform).
• Present results concisely and effectively to collaborators.

Example projects include:

  • Implement a pipeline to compare existing image analysis algorithms across benchmark datasets.
  • Develop or apply consensus clustering approaches to define expression signatures correlated with patient outcome.
  • Model drug response using genomic, transcriptomic, and clinical features.
  • Validate methylation markers of risk computationally.

Basic Qualifications

• PhD in statistics, mathematics, physics, computational biology, computer science, bioinformatics, or related quantitative discipline.
• Masters degree in one of the above areas, 5+ years of significant relevant work experience, and a strong track record of statistical analysis and/or machine learning will also be considered.
• Proven expertise in state-of-the-art machine learning and statistical techniques, such as modeling (e.g., regularized regression, survival analysis, GLM), supervised learning (e.g., SVMs, neural networks), unsupervised learning (e.g., k-means), dimensionality reduction (e.g., PCA), and Bayesian analysis.
• Exceptional problem-solving skills, particularly the ability to address a defined problem or hypothesis (biological or otherwise) creatively and with limited supervision.
• Strong programming skills in R and/or Python.

Additional Skills/Preferences

• Experience working with high-dimensional biological data, such as gene expression, genomic, imaging, drug response, flow cytometry, or CyTOF.  Immediate needs and emphasis are in RNA-seq, single-cell RNA-seq, single-cell imaging, video, and drug response data.
• Knowledge of biology, particularly cancer.
• Demonstrated excellence in research.
• Software development skills, including experience with version control software (e.g., github)
• Familiarity with cloud environments (especially AWS) and containerization approaches (principally Docker)
• A passion for open-access innovation.
• Strong collaboration, teamwork, presentation, and communication skills.

About Sage Bionetworks

Sage Bionetworks is a world-leading nonprofit biomedical research organization in Seattle, WA. We are dedicated to building and supporting open communities of collaborative research in human health and genomics. We are developing multiple initiatives designed to facilitate scientific collaborations and enable direct contributions of ideas and data from citizens to research projects. Sage embraces diversity and equity. We are based in Seattle, WA, and collaborate broadly throughout the world.

Apply here.


Current Positions

Computational Oncology

The Computational Biology group focuses on developing integrative probabilistic models for prediction of disease phenotypes and validating of hypotheses generated by novel methodologies. Currently opportunities include: positions in Oncology focused on conducting original research in analyzing large-scale high dimensional genomics data to develop predictive models of cancer phenotypes. Positions in collaboration with the recently merged Sage/DREAM effort, focused on designing and implementing crowd-sourced collaborative challenges around cancer phenotype prediction problems. Positions in stem cell bioinformatics with a focus on development of the data and analysis bioinformatics portal for the Progenitor Cell Biology Consortium, as well as research projects on modeling molecular mechanisms underlying stem cell differentiation.

Digital Health

Sage Bionetworks’ digital health program is designed to improve disease characterization through the use of sensor-based technologies and bi-directional feedback to improve health monitoring and provide quantitative metrics to assess disease impact on health and on quality of life. We maximize the insights gained from these efforts by providing them through Synapse, a collaborative compute platform. Our mHealth team includes expertise in software engineering (both iOS and Android), clinical study design and development, data governance and data analysis. We are actively involved in projects across a range of disease areas and within the Precision Medicine Initiative.

Neurodegenerative Research

An overarching goal of the Neurodegenerative Research (NDR) group is to improve understanding of the molecular mechanisms of neurodegeneration via computational analyses of high-dimensional genomic data-sets. Our group leads analyses of such data in consortia focused on Alzheimer’s Disease (AD) and related neurodegenerative disorders, including AMP-AD and MODEL-AD. We also work across disciplines to develop technologies that make these analyses available to a wide audience of researchers. Most notably, we recently celebrated the launch of Agora, an interactive, web-based explorer that provides access to research and analyses of nascent AD drug targets produced in conjunction with the NIH-led Accelerating Medicines Partnership.

Systems Biology

The Systems Biology research group at Sage Bionetworks is working to understand the underlying mechanisms causal to common disease. We use large-scale genomic analysis to identify disease subclasses, generate diagnostic and prognostic biomarkers, and to identify pathophysiology causal to disease in collaboration with academic and industry partners. Our current portfolio is focused on neurobiology, spanning both neurodegenerative and neuropsychiatric disorders, and includes projects in other disease areas including immunology, metabolic disease and craniofacial deformation.

Technology Platforms & Services

We’re working on the tools and platforms required to gather, share and use biomedical data in novel ways. These are targeted both at the research community, as well as organizations and individuals who are involved in providing data and being involved in the research process. They range from the technology platforms Synapse and BRIDGE, through novel methods of addressing governance issues around the distribution of human data such as E-Consent, to the ability to run Challenges to solve data-driven questions through our partnership with DREAM.