What impact do incorrect AI results have on radiologist performance? That question was the focus of a new study in European Radiology in which radiologists who received incorrect AI results were more likely to make wrong decisions on patient follow-up – even though they would have been correct without AI’s help.
The accuracy of AI has become a major concern as deep learning models like ChatGPT become more powerful and come closer to routine use. There’s even a term – the “hallucination effect” – for when AI models veer off script to produce text that sounds plausible but in fact is incorrect.
While AI hallucinations may not be an issue in healthcare – yet – there is still concern about the impact that AI algorithms are having on clinicians, both in terms of diagnostic performance and workflow.
To see what happens when AI goes wrong, researchers from Brown University sent 90 chest radiographs with “sham” AI results to six radiologists, with 50% of the studies positive for lung cancer. They employed different strategies for AI use, ranging from keeping the AI recommendations in the patient’s record to deleting them after the interpretation was made. Findings included:
- When AI falsely called a true-pathology case “normal,” radiologists’ false-negative rates rose compared to when they didn’t use AI (20.7-33.0% depending on AI use strategy vs. 2.7%)
- AI calling a negative case “abnormal” boosted radiologists’ false-positive rates compared to without AI (80.5-86.0% vs. 51.4%)
- Not surprisingly, when AI calls were correct, radiologists were more accurate with AI than without, with increases in both true-positive rates (94.7-97.8% vs. 88.3%) and true-negative rates (89.7-90.7% vs. 77.3%)
Fortunately, the researchers offered suggestions on how to mitigate the impact of incorrect AI. Radiologists had fewer false negatives when AI provided a box around the region of suspicion, a phenomenon the researchers said could be related to AI helping radiologists focus.
Also, radiologists’ false positives were higher when AI results were retained in the patient record versus when they were deleted. Researchers said this was evidence that radiologists were less likely to disagree with AI if there was a record of the disagreement occurring.
The Takeaway
As AI becomes more widespread clinically, studies like this will become increasingly important in shaping how the technology is used in the real world, and add to previous research on AI’s impact. Awareness that AI is imperfect – and strategies that take that awareness into account – will become key to any AI implementation.