At Sage Bionetworks, we believe that we can learn more by learning from each other. By improving the way scientists collaborate, we help to make science more effective. We partner with researchers, patients, and healthcare innovators to drive collaborative data-driven science to improve health. Making science more open, collaborative, and inclusive ultimately advances biomedicine.
Do you have expertise in data integration and a passion for mission driven work? Do you want to be an integral part of a team that includes computational biologists, software engineers, and data curators? If so, could be our next Data Engineer. You will build tools to support data ingress for a multi-institution center focused on developing new medicines for Alzheimer’s Disease. The center aims to use radically open practices to accelerate finding a treatment or cure for AD. Public sharing of data and resources during the earliest stages of research is critical to the mission of the center. You will design optimal data models to facilitate data sharing. You will create a common schema that integrates to multiple public and private databases as well as build data upload tools for the different sites that make up this collaboration. These will be used by researchers as well as our software engineers to build frontends for finding, and accessing these resources.
What you’ll be doing:
- Building new software tools to enable automated ingestion of data and resources into a central data repository.
- Working within a team of biologists, statisticians and database administrators to compile and format large, highly-dimensional data sets.
- Work with bioinformaticians and experimental biologists to understand data requirements.
- Developing new processes for maintaining and growing datasets to meet the strategic needs of the research community.
- Using and developing new tools that build on existing APIs for data standardization.
- Streamlining processes and implementing new methods using scripting.
- Creating visualizations and dashboards to facilitate users in understanding the scope and value of available content.
We’d love to hear from you if you:
- Are passionate about open science and collaboration
- Have a master’s degree in a computational field (e.g. computer science, computational biology, physics) or a Bachelor’s degree in the same with a minimum of 3+ years experience with data engineering.
- Are proficient in scripting languages such as Python or R equivalent.
- Are proficient in using SQL for ETL operations.
- Are familiar with continuous integration and deployment strategies.
- Are able to work independently and in a team setting.
- Have experience in building dashboards and visualizations to summarize data.
- Have excellent communication skills.
- Experience working with multiple data modalities data preferred.
- Proficiency in Linux and Linux-based scripting tools (sed, awk, shell scripts, etc) preferred.
- Experience with cluster or cloud computing (preferably AWS) preferred.