PSB Workshop Agenda
Establishing the Reliability of Algorithms in Biomedical Research
Workshop Goals and Strategy
This event has transformed into a virtual workshop. We’ve designed an agenda that we hope will provide stimulating dialogue on approaches emerging across the bioinformatic community to quantify the accuracy of algorithms and the reliability of outputs–including discussion of potential mechanisms for establishing standards that enforce greater accountability across the community. How, you ask, are we going to make a Zoom call as stimulating as discussion on a tropical island? We’ve got a few ideas:
- Facilitated dialogue: We’ve pre-selected four questions that we personally want to talk about and we have posted them below. For each question, we’ve convinced two to three engaging researchers to come prepared to talk for three minutes each. We also ask you to think about what you want to contribute–or learn–about these topics and come prepared to share it.
- Respect for your time: Because we are all busy, you can come-and-go throughout the workshop. It’s hard to stare at the screen for 3 hours, no matter how stimulating this conversation is. We provide two alternative ways to participate: First, we commit to staying on schedule throughout the workshop so you can come and go to hear the conversations you are most interested in. Second, we commit to recording the dialogue through Slack and we encourage you to contribute to the conversation there or catch up on what was discussed in an asynchronous manner.
10:00-11:00 a.m. | Short talks with Q&A ~ to stimulate your ideas
- Responsible data sharing as a requirement for algorithm benchmarking – Lara Mangravite, Sage Bionetworks (10 min)
- Algorithm evaluation in clinical informatics – Sean Mooney, University of Washington (20 min)
- What can community challenges do for you? – Iddo Friedberg, Iowa State University (20 min)
11:00-11:30 a.m. | Discussion Topic 1
What are the key considerations for an algorithm’s site-specific performance that should be reported on when implementing a machine learning solution for clinical use?
Discussants: Nigam Shah, Stanford University | Anant Madabhushi, Case Western Reserve University
11:30 a.m.-12:00 p.m. | Discussion Topic 2
Performance measures such as F1, AUROC, AUPR, etc. are used in ML-based community challenges to assess how well algorithms perform. However, “once a measure becomes the target, it ceases to be a good measure” (Goodhart’s law), how do we keep community challenges viable while connecting them with real-life problems?
Discussants: Marc Robinson-Rechavi, University de Lausanne | John Moult, University of Maryland | Predrag Radivojac, Northeastern University
12:00-12:30 p.m. | Discussion Topic 3
Many programs have sprung up around the community to develop benchmarking programs. Where and how might the broader research community seek to aggregate knowledge and methods across these grassroots efforts?
Discussants: Sandra Orchard, EBI | Constantina Bakolitsa, Berkeley
12:30-1:00 p.m. | Discussion Topic 4
If we agree that independent benchmarking of algorithms is a necessary step in biomedical informatics, how do we standardize use of these approaches to ensure that they are expected as part of every day practice across the field?
Discussants: Casey Greene, University of Colorado | Benjamin Haibe-Kains, Princess Margaret Cancer Center