AACR Project GENIE Serves As Hub and Harmonizer for Cancer Data

New data release highlights Sage’s key role in mapping data and enabling broader accessibility



AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) launched in 2015 to serve as a registry for pan-cancer data from tens of thousands of patients treated at the eight participating institutions. The registry enables the mapping of cancer genomic data to clinical outcomes. Having patient data from AACR Project GENIE helps to improve clinical decision-making, especially for rare cancers and rare variants in common cancers.

Sage Bionetworks has served as the data hosting and integration hub, overseeing data ingestion and processing, and two data releases per year. Sage has contributed to the development of detailed data dictionaries and formats, and has deployed operating procedures that describe the workflow streams required of each institution as well as data processing pipelines to validate the data across the contributing centers.

Ensuring the cancer data is processed and standardized so researchers can easily access the breadth of information is a massive project. Over the course of 2019, this group has also been collaborating with teams at Memorial Sloan Kettering and the GDC to convert Project GENIE data from 44,756 cancer cases to be compatible with the National Cancer Institute’s Genomic Data Commons (GDC). (Read more about this impactful data release on the AACR blog.)

In an ideal world, there would be a single standard for clinical patient data. But there isn’t, so sharing cancer genomic data from one medical center to another is far from trivial. In this case, Sage’s Kristen Dang and Thomas Yu, and MSKCC’s Stacy Thomas oversaw the mapping of the data. It required programmatic and line-by-line review in order to convert the data for integration with GDC.

Sage values data accessibility and believes in breaking down barriers that can hinder crucial research. Preparing the initial data release for GDC came with challenges, but the process helped teams address some of the pain points, so future releases will be more streamlined.


There will be an AACR Project GENIE data release in January 2020. Visit the data release log to stay current.