Artificial Intelligence

The Mammography AI Generalizability Gap

The “radiologists with AI beat radiologists without AI” trend might have achieved mainstream status in Spring 2020, when the DM DREAM Challenge developed an ensemble of mammography AI solutions that allowed radiologists to outperform rads who weren’t using AI.

The DM DREAM Challenge had plenty of credibility. It was produced by a team of respected experts, combined eight top-performing AI models, and used massive training and validation datasets (144k & 166k exams) from geographically distant regions (Washington state, USA & Stockholm, Sweden).

However, a new external validation study highlighted one problem that many weren’t thinking about back then. Ethnic diversity can have a major impact on AI performance, and the majority of women in the two datasets were White.

The new study used an ensemble of 11 mammography AI models from the DREAM study (the Challenge Ensemble Model; CEM) to analyze 37k mammography exams from UCLA’s diverse screening program, finding that:

  • The CEM model’s UCLA performance declined from the previous Washington and Sweden validations (AUROCs: 0.85 vs. 0.90 & 0.92)
  • The CEM model improved when combined with UCLA radiologist assessments, but still fell short of the Sweden AI+rads validation (AUROCs: 0.935 vs. 0.942)
  • The CEM + radiologists model also achieved slightly lower sensitivity (0.813 vs. 0.826) and specificity (0.925 vs. 0.930) than UCLA rads without AI 
  • The CEM + radiologists method performed particularly poorly with Hispanic women and women with a history of breast cancer

The Takeaway

Although generalization challenges and the importance of data diversity are everyday AI topics in late 2022, this follow-up study highlights how big of a challenge they can be (regardless of training size, ensemble approach, or validation track record), and underscores the need for local validation and fine-tuning before clinical adoption. 

It also underscores how much we’ve learned in the last three years, as neither the 2020 DREAM study’s limitations statement nor critical follow-up editorials mentioned data diversity among the study’s potential challenges.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like

Cardiac Imaging October 21, 2024

FFR-CT Reduces Invasive Angiography Rates October 21, 2024

Performing automated CT-derived fractional flow reserve with Shukun Technology’s software reduced referrals to invasive coronary angiography by 19% in a new study in Radiology. The findings suggest that software-based FFR-CT can serve a gatekeeper role in managing workup of patients with suspected coronary artery disease.  Cardiac CT has been a revolutionary tool for assessing people […]

Imaging IT October 18, 2024

Reduce the Mess, Reduce the Stress: Automating and Accelerating Efficiency in Complex Medical Imaging Environments October 18, 2024

Repetitive, arduous tasks are a major contributor to burnout – an increasingly prevalent issue in healthcare. While digital innovation is transformative, introducing more technology to workflows often creates additional layers of complexity, hindering efficiency, performance monitoring, and ultimately the quality of care. As a result, once-simple traditional workflows have grown cumbersome over time, filled with […]

Patient Engagement October 17, 2024

Do Imaging Costs Scare Patients? October 17, 2024

A new study in JACR reveals an uncomfortable reality about medical imaging price transparency: Patients who knew how much they would have to pay for their imaging exam were less likely to complete their study.  Price transparency has been touted as a patient-friendly tool that can get patients engaged with their care while also helping […]

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!