At Sage Bionetworks, we believe that we can learn more by learning from each other and by applying open practices to data-driven research. By improving the way scientists collaborate, and by increasing the transparency and reliability of research, we will make science more effective. We partner with funders, researchers, and healthcare innovators to drive collaborative data-driven science for the advancement of human health.
This position will play a critical role in our scientific research communities by providing support of biomedical data repositories through data curation, and the development of data formats and metadata standards. We are looking for someone who understands the complexities of combining heterogeneous biological data sources, the difficulties of standardizing metadata, and the value that these efforts provide in the support of reproducible research.
What you’ll be doing:
- Work within a team of research scientists, data curators, and outreach and communication specialists to curate, distribute, and publicize highly-dimensional biomedical data sets coming from multiple NIH and nonprofit foundation-supported programs.
- Develop and document metadata standards for sharing datasets under the FAIR guiding principles.
- Develop or extend existing data models/schemas for projects in coordination with Project Leads.
- Document data ingest processes and curation SOPs. Write data release notes. Produce data reports.
- Contribute and maintain data portal content (e.g. PsychENCODE Knowledge Portal)
- Streamline and implement new processes for data curation using scripting and statistical programming.
- Manage access to sensitive datasets in collaboration with the Sage governance team.
We’d love to hear from you if you:
- Have a master’s degree in computational field, library science, data management, OR a bachelor’s degree in one of the above areas with 3+ years of relevant work experience. A minimum of 2 years working with high dimensional data repositories is preferred.
- Have experience working with common biomedical data types derived from disease model systems and patients. Experience with gene expression and other omic data is preferred.
- Have an interest in learning about new high throughput biological technologies.
- Are proficient in a scripting language such as Python or R.
- Have a basic understanding of SQL.
- Have a basic understanding of JSON.
- Are highly organized and have great attention to detail.
- Are able to work individually and within a team.
- Are passionate about open science and collaboration.
In light of recent concerns of Covid-19, all interviews will be conducted remotely, and most positions will be remote through at least September 30, 2021. The option to work on-site at our Seattle office prior to September 30, will be considered upon request.
This position is eligible to be fully remote. Sage Bionetworks is actively working on policies and plans for the post-COVID distributed work environment.