Data Engineer, Systems Biology

Data Engineer, Systems Biology

Position Overview: Sage Bionetworks is seeking a data engineer to join the systems biology team.  At Sage, we develop and test social, technical, and scientific solutions to foster open practices and enable collaborative research. The systems biology group focuses on data rich research collaborations where teams of scientists collaborate to generate and analyze billions of measurements on thousands of research samples. The broader goal is to advance scientific research by establishing community consensus through data and knowledge sharing. The Data Engineer’s primary focus will be to develop and build tools to organize data streams from multiple external teams studying different aspects of Alzheimer’s disease, along with systems to facilitate facile analysis of these data streams via workflow and containerization technologies.  The Data Engineer will be a technical lead who can proactively identify and outline technical needs across multiple projects and drive the development of solutions by defining requirements and overseeing multiple project roadmaps.

Specific Responsibilities include:

  • Manage the distribution of knowledge and data by defining and overseeing the development of data and visualizations web portals.
  • Work with Sage scientists and engineers to develop common strategies and processes for the hosting, curation, and annotation of large clinical and genomic data sets.
  • Work with Synapse engineering team to lead development of analysis workflows for scientific compute on terabytes of genomic data using cloud infrastructure.
  • Design and develop programmatic solutions to ingest data, manage metadata, and improve data discoverability.
  • Build on our existing tools and APIs to help scientists perform computational research.

Basic Qualifications:

  • MS or PhD degree in computational field.
  • 2+ years experience with data processing.
  • 1+ years experience with cluster or cloud computing.
  • High level knowledge of R, Python or Matlab a must.
  • Comfortable using Linux.
  • Experience with querying and using databases a plus.
  • Experience with bioinformatics techniques, specifically genomic data generation and analysis, preferred.
  • Effective and efficient communication skills for diverse audience.
  • Experience with workflow technologies is a plus.

Sage offers competitive compensation and a comprehensive benefits package. 

To apply, please submit CV and cover letter.

About Sage Bionetworks: Sage Bionetworks is a non-profit organization dedicated to advancing biomedical research through the implementation of reproducible, open science. In collaboration with scientists around the world, we build robust computational models of disease-related phenotypes through integrative analysis of large-scale genomic, imaging, and mHealth data sets. To enhance collaborative efforts, we leverage a compute platform ( for sharing research insights in a transparent, reproducible fashion.

Current Positions

Computational Oncology

The Computational Biology group focuses on developing integrative probabilistic models for prediction of disease phenotypes and validating of hypotheses generated by novel methodologies. Currently opportunities include: positions in Oncology focused on conducting original research in analyzing large-scale high dimensional genomics data to develop predictive models of cancer phenotypes. Positions in collaboration with the recently merged Sage/DREAM effort, focused on designing and implementing crowd-sourced collaborative challenges around cancer phenotype prediction problems. Positions in stem cell bioinformatics with a focus on development of the data and analysis bioinformatics portal for the Progenitor Cell Biology Consortium, as well as research projects on modeling molecular mechanisms underlying stem cell differentiation.

Digital Health

Sage Bionetworks’ digital health program is designed to improve disease characterization through the use of sensor-based technologies and bi-directional feedback to improve health monitoring and provide quantitative metrics to assess disease impact on health and on quality of life. We maximize the insights gained from these efforts by providing them through Synapse, a collaborative compute platform. Our mHealth team includes expertise in software engineering (both iOS and Android), clinical study design and development, data governance and data analysis. We are actively involved in projects across a range of disease areas and within the Precision Medicine Initiative.

Neurodegenerative Research

An overarching goal of the Neurodegenerative Research (NDR) group is to improve understanding of the molecular mechanisms of neurodegeneration via computational analyses of high-dimensional genomic data-sets. Our group leads analyses of such data in consortia focused on Alzheimer’s Disease (AD) and related neurodegenerative disorders, including AMP-AD and MODEL-AD. We also work across disciplines to develop technologies that make these analyses available to a wide audience of researchers. Most notably, we recently celebrated the launch of Agora, an interactive, web-based explorer that provides access to research and analyses of nascent AD drug targets produced in conjunction with the NIH-led Accelerating Medicines Partnership.

Systems Biology

The Systems Biology research group at Sage Bionetworks is working to understand the underlying mechanisms causal to common disease. We use large-scale genomic analysis to identify disease subclasses, generate diagnostic and prognostic biomarkers, and to identify pathophysiology causal to disease in collaboration with academic and industry partners. Our current portfolio is focused on neurobiology, spanning both neurodegenerative and neuropsychiatric disorders, and includes projects in other disease areas including immunology, metabolic disease and craniofacial deformation.

Technology Platforms & Services

We’re working on the tools and platforms required to gather, share and use biomedical data in novel ways. These are targeted both at the research community, as well as organizations and individuals who are involved in providing data and being involved in the research process. They range from the technology platforms Synapse and BRIDGE, through novel methods of addressing governance issues around the distribution of human data such as E-Consent, to the ability to run Challenges to solve data-driven questions through our partnership with DREAM.