An AJR study shared new evidence of how X-ray image labels influence deep learning decision making, while revealing one way developers can address this issue.
Confounding History – Although already well known by AI insiders, label and laterality-based AI shortcuts made headlines last year when they were blamed for many COVID algorithms’ poor real-world performance.
The Study – Using 40k images from Stanford’s MURA dataset, the researchers trained three CNNs to detect abnormalities in upper extremity X-rays. They then tested the models for detection accuracy and used a heatmap tool to identify the parts of the images that the CNNs emphasized. As you might expect, labels played a major role in both accuracy and decision making.
- The model trained on complete images (bones & labels) achieved an 0.844 AUC, but based 89% of its decisions on the radiographs’ laterality/labels.
- The model trained without labels or laterality (only bones) detected abnormalities with a higher 0.857 AUC and attributed 91% of its decision to bone features.
- The model trained with only laterality and labels (no bones) still achieved an 0.638 AUC, showing that AI interprets certain labels as a sign of abnormalities.
The Takeaway – Labels are just about as common on X-rays as actual anatomy, and it turns out that they could have an even greater influence on AI decision making. Because of that, the authors urged AI developers to address confounding image features during the curation process (potentially by covering labels) and encouraged AI users to screen CNNs for these issues before clinical deployment.