A new Nature study suggests that imaging AI models might underdiagnose patient populations who are also underdiagnosed in the real world, revealing new ethical and clinical challenges for AI development, regulation, and adoption.
The Study – The researchers trained four AI models to predict whether images would have positive diagnostic findings using three large/diverse public CXR datasets (one model w/ each dataset, one w/ combined dataset, 707k total images). They then analyzed model performance across various patient populations.
The Underdiagnosed – The AI models were mostly likely to underdiagnose patients who are female, young (0-20yrs), Hispanic and Black, and covered by Medicaid (low-income). AI underdiagnosis rates were even more extreme among patients who belonged to multiple underserved groups, such as Hispanic females or younger Black patients.
The Overdiagnosed – As you might expect, healthy patients who were incorrectly flagged by the AI models as unhealthy were usually male, older, White, and higher income.
The Clinical Impact – In clinical use, a model like this would result in traditionally underserved patients experiencing more missed diagnoses and delayed treatments, while traditionally advantaged patients might undergo more unnecessary tests and treatments. And we know from previous research that AI can independently detect patient race in scans (even if we don’t know why).
The Takeaway – AI developers have been working to reduce racial/social bias in their models by using diverse datasets, but it appears that they could be introducing more systemic biases in the process (or even amplifying them). These biases certainly aren’t AI developers’ fault, but they still add to the list of data source problems that developers will have to solve.