We spend a lot of time exploring the technical aspects of imaging AI performance, but little is known about how physicians are actually influenced by the AI findings they receive. A new Scientific Reports study addresses that knowledge gap, perhaps more directly than any other research to date.
The researchers provided 233 radiologists (experts) and internal and emergency medicine physicians (non-experts) with eight chest X-ray cases each. The CXR cases featured correct diagnostic advice, but were manipulated to show different advice sources (generated by AI vs. by expert rads) and different levels of advice explanations (only advice vs. advice w/ visual annotated explanations). Here’s what they found…
- Explanations Improve Accuracy – When the diagnostic advice included annotated explanations, both the IM/EM physicians and radiologists’ accuracy improved (+5.66% & +3.41%).
- Non-Rads with Explainable Advice Rival Rads – Although the IM/EM physicians performed far worse than rads when given advice without explanations, they were “on par with” radiologists when their advice included explainable annotations (see Fig 3).
- Explanations Help Radiologists with Tough Cases – Radiologists gained “limited benefit” from advice explanations with most of the X-ray cases, but the explanations significantly improved their performance with the single most difficult case.
- Presumed AI Use Improves Accuracy – When advice was labeled as AI-generated (vs. rad-generated), accuracy improved for both the IM/EM physicians and radiologists (+4.22% & +3.15%).
- Presumed AI Use Improves Expert Confidence – When advice was labeled as AI-generated (vs. rad-generated), radiologists were more confident in their diagnosis.
This study provides solid evidence supporting the use of visual explanations, and bolsters the increasingly popular theory that AI can have the greatest impact on non-experts. It also revealed that physicians trust AI more than some might have expected, to the point where physicians who believed they were using AI made more accurate diagnoses than they would have if they were told the same advice came from a human expert.
However, more than anything else, this study seems to highlight the underappreciated impact of product design on AI’s clinical performance.