How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate.
AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI.
But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.
So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions.
They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:
1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT
Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding …
- AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
- AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
- Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario
How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.
The Takeaway
Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.