The Metastatic Breast Cancer Alliance (MBC) is an advocacy group to evaluate and quantify the extent of research in metastatic breast cancer, and assess the overall adequacy of funding in this area. The landscape of MBC funding can be described through a data science approach. An initial 2000 grants were curated by a group of grant curators and managed using a spreadsheet. All grants were classified into 6 different metastatic stages (arrest & extravasation, immune surveillance/escape, intravasation circulation, invasion, metabolic deregulation, metastatic colonization) and classified into two groups: MBC and non-MBC. This process is extremely tedious, as curators share an excel spreadsheet filling in the columns with correct annotations, often taking many days to complete.
An online platform was created to provide a clean space for curators to read and annotate grants. The platform is separated into 4 main pages, project dashboard, grant selection, grant information, and upload data. The project dashboard shows a brief summary of the number of MBC related grants and the distribution of the metastatic stage annotations done by MBC and the machine learning algorithm. The grant selection section allows for users to filter grants by metastatic stage, query the list of grants by title, authors, or institution, and download the subset of grants. Users can then click on their grant of interest to view metadata and update annotations related to the grant.
The online platform may create a more friendly environment for annotating grants than an excel spreadsheet, but the task of annotating is still extremely time consuming. This process can be automated by using text mining, and machine learning algorithms. Keywords (features) are first extracted from the abstracts. These features are then used in the machine learning algorithm to label the grant with the correct annotations. A confidence score is provided with the annotation to assist curators in evaluating the accuracy of the generated annotations. These auto-generated annotations are displayed in the platform so that curators can access the validity of the algorithms and correct any incorrect annotations. The algorithms will learn and improve as annotations are corrected and more data is uploaded into the platform.
Sage Bionetworks’ role in this project is generously funded by a grant from the Avon Foundation for Women.