SEATTLE–(BUSINESS WIRE)–Four open access papers issued in Nature Genetics today highlight the benefits of making large biomedical research data sets open as a community resource and equipped with cloud-based tools that facilitate rapid learning and future discovery.
The papers report on the genetic profiles of 12 cancer types, describe the collaborative approach taken by The Cancer Genome Atlas (TCGA) Pan-Cancer Project researchers and highlight the new bioinformatics tools that Sage Bionetworks provided to the effort. TCGA includes 250 researchers spread across 30 institutions and running a total of 60 different research projects, all affiliated with the same set of nearly 2000 biomedical data files covering 12 different cancer types. For the duration of the Pan-Cancer project, TCGA members agreed to pilot Synapse (www.synapse.org), Sage Bionetworks’ software platform, for managing the myriad interdependencies of these projects and for facilitating the sharing and evolving of data and findings in real time.
The non-profit Sage Bionetworks was founded in 2009 with a vision to accelerate biomedical research by developing open systems, incentives and standards that allow an open commons of research to flourish. Sage Bionetworks’ Director of Computational Biology, Dr. Adam Margolin likens Synapse to “…a computational researcher’s sandbox where open data aggregates and can be used continuously by researchers. We intentionally embedded tools of collaboration into Synapse. These are things like automated data versioning and real time ‘provenance’ records that detail how a researcher processed his/her data. With these tools in place, Synapse ends up transforming efforts like these first reported findings of the Pan-Cancer project into an open resource: where any stage of the work can serve as the starting point for additional exploration by the greater scientific community.”
The TCGA community piloted the Synapse software platform for its ability to support three different requirements of their collaborative work: providing data freezes and data versioning controls, conducting and sharing multistep data analysis workflows and collaboratively evolving novel analytical methods. Today’s four TCGA papers in Nature Genetics are the first of 18 papers already in press that showcase a range of discoveries all emerging from a common set of data managed in Synapse. In addition, the resulting TCGA data freezes, analysis results and evaluation framework for survival predictions are a new publicly available resource released on Synapse in conjunction with this work.
Judging from the impressions of two TCGA researchers, the way in which Synapse supported the Pan-Cancer project could very well become a working model to guide aspects of future large-scale collaborative studies on biomedical data.
Kyle Ellrott, a UCSC software developer and heavy user of Synapse for the TCGA projects, notes, “Synapse empowers data providers and analysts to share their work while at the same time providing a common framework everyone can use. It is the YouTube of scientific data.”
Professor Josh Stuart (Biomolecular Engineering, UCSC) concludes, “Synapse was indeed the connecting data framework that held the entire project together. It represents an important milestone for collaborative science that so many groups around the country and world were able to work together on a common set of scientific problems. The beauty of it is that it will only improve as we scale to even larger projects in the near future.”