The Mammography AI Generalizability Gap

The “radiologists with AI beat radiologists without AI” trend might have achieved mainstream status in Spring 2020, when the DM DREAM Challenge developed an ensemble of mammography AI solutions that allowed radiologists to outperform rads who weren’t using AI.

The DM DREAM Challenge had plenty of credibility. It was produced by a team of respected experts, combined eight top-performing AI models, and used massive training and validation datasets (144k & 166k exams) from geographically distant regions (Washington state, USA & Stockholm, Sweden).

However, a new external validation study highlighted one problem that many weren’t thinking about back then. Ethnic diversity can have a major impact on AI performance, and the majority of women in the two datasets were White.

The new study used an ensemble of 11 mammography AI models from the DREAM study (the Challenge Ensemble Model; CEM) to analyze 37k mammography exams from UCLA’s diverse screening program, finding that:

  • The CEM model’s UCLA performance declined from the previous Washington and Sweden validations (AUROCs: 0.85 vs. 0.90 & 0.92)
  • The CEM model improved when combined with UCLA radiologist assessments, but still fell short of the Sweden AI+rads validation (AUROCs: 0.935 vs. 0.942)
  • The CEM + radiologists model also achieved slightly lower sensitivity (0.813 vs. 0.826) and specificity (0.925 vs. 0.930) than UCLA rads without AI 
  • The CEM + radiologists method performed particularly poorly with Hispanic women and women with a history of breast cancer

The Takeaway

Although generalization challenges and the importance of data diversity are everyday AI topics in late 2022, this follow-up study highlights how big of a challenge they can be (regardless of training size, ensemble approach, or validation track record), and underscores the need for local validation and fine-tuning before clinical adoption. 

It also underscores how much we’ve learned in the last three years, as neither the 2020 DREAM study’s limitations statement nor critical follow-up editorials mentioned data diversity among the study’s potential challenges.

Google Launches Cloud Medical Imaging Suite

Google announced what might be its biggest, or at least most public, push into medical imaging AI with the launch of its new Google Cloud Medical Imaging Suite.

The Suite directly targets organizations who are developing imaging AI models and performing advanced image-based analytics, while also looking to improve Google’s positioning in the healthcare cloud race.

The Medical Imaging Suite is (logically) centered around Google Cloud’s image storage and Healthcare API, which combine with its DICOMweb-based data exchange and automated DICOM de-identification tech to create a cloud-based AI development environment. Meanwhile, its “Suite” title is earned through integrations with an array of Google and partner solutions:

  • NVIDIA’s annotation tools (including its MONAI toolkit) to help automate image labeling
  • Google’s BigQuery and Looker solutions to search and analyze imaging data, and create training datasets
  • Google’s Vertex AI environment to accelerate AI pipeline development
  • NetApp’s hybrid cloud services to support on-premise-to-cloud data management
  • Google’s Anthos solution for centralized policy management and enforcement
  • Change Healthcare’s cloud-native enterprise imaging PACS for clinical use

It’s possible that many of these solutions were already available to Google Cloud users, and it appears that AWS and Azure have a similar list of imaging capabilities/partners, so this announcement might prove to be more technologically significant if it leads to Google Cloud creating a differentiated and/or seamlessly-integrated suite going forward.

However, the announcement’s marketing impact was immediate, as press articles and social media conversations largely celebrated Google Cloud’s new role in solving imaging’s interoperability and AI development problems. It’s been a while since we’ve seen AWS or Azure gain imaging headlines or public praise like that, and they’re the healthcare cloud market share leaders.

The Takeaway

Although some might debate whether the Medical Imaging Suite’s features are all that new, last week’s launch certainly reaffirms Google Cloud’s commitment to medical imaging (with an AI development angle), and suggests that we might see more imaging-targeted efforts from them going forward.

Arterys and Tempus’ Precision Merger

Arterys was just acquired by precision medicine AI powerhouse Tempus Labs, marking perhaps the biggest acquisition in the history of imaging AI, and highlighting the segment’s continued shift beyond traditional radiology use cases. 

Arterys has become one of imaging’s AI platform and cardiac MRI 4D flow leaders, leveraging its 12 years of work and $70M in funding to build out a large team of imaging/AI experts, a solid customer base, and an attractive intellectual property portfolio (AI models, cloud viewer, and a unique multi-vendor platform).

Tempus Labs might not be a household name among Imaging Wire readers, but they’ve become a giant in the precision medicine AI space, using $1.1B in VC funding and the “largest library of clinical & molecular data” to develop a range of precision medicine and treatment discovery / development / personalization capabilities.

It appears that Arterys will continue to operate its core radiology AI business (with far more financial support), while supporting the imaging side of Tempus’s products and strategy.

This acquisition might not be as unprecedented as some think. We’ve seen imaging AI assume a central role within a number of next-generation drug discovery/development companies, including Owkin and nference (who recently acquired imaging AI startup Predible), while imaging AI companies like Quibim are targeting both clinical use and pharma/life sciences applications.

Of course, many will point out how this acquisition continues 2022’s AI shakeup, which brought at least five other AI acquisitions (Aidence & Quantib by RadNet; Nines by Sirona, MedoAI by Exo, Predible by nference) and two strategic pivots (MaxQ AI & Kheiron). Although these acquisitions weren’t positive signs for the AI segment, they revealed that imaging AI startups are attractive to a far more diverse range of companies than many could have imagined back in 2021 (including pharma and life sciences).

The Takeaway

Arterys just transitioned from being an independently-held leader of the (promising but challenged) diagnostic imaging AI segment to being a key part of one of the hottest companies in healthcare AI, all while managing to keep its radiology business intact. That might not be the exit that Arterys’ founders envisioned, but in many ways it’s an ideal second chapter.

Plaque AI’s First Reimbursement

The small list of cardiac imaging AI solutions to earn Medicare reimbursements just got bigger, following CMS’ move to add an OPPS code for AI-based coronary plaque assessments. That represents a major milestone for Cleerly, who filed for this code and leads the plaque AI segment, and it marks another sign of progress for the business of imaging AI.

With CMS’ October 1st OPPS update, Cleerly and other approved plaque AI solutions now qualify for $900 to $1,000 reimbursements when used with Medicare patients scanned in hospital outpatient settings. 

  • That achievement sets the stage for plaque AI’s next major reimbursement hurdle: gaining coverage from local Medicare Administrative Contractors (MACs) and major commercial payers.

Cleerly and its qualifying plaque AI competitors join a growing list of Medicare-reimbursed imaging AI solutions, headlined by HeartFlow’s FFRCT solution ($930-$950) and Perspectum’s LiverMultiScan MRI software ($850-$1,150), both of which have since expanded their reimbursements across MAC regions and major commercial payers. 

  • The last few years also brought temporary NTAP reimbursements for Viz.ai (LVO detection / coordination), Caption Health (echo AI guidance), and Optellum (lung cancer risk assessments), plus a growing number of imaging AI CPT III codes that might lead to future reimbursements.

The new reimbursement should also drive advancements within the CCTA plaque AI segment, giving providers more incentive to adopt this technology, and providing emerging plaque AI vendors (e.g. Elucid, Artrya) a clearer path towards commercialization and VC funding.

The Takeaway

CMS’ new plaque AI OPPS code marks a major milestone for Cleerly’s commercial and clinical expansion, and a solid step for the plaque AI segment. 

The reimbursement also adds momentum for the overall imaging AI industry, which finally seems to be gaining support from CMS. That’s good news for AI vendors, since it’s pretty much proven that reimbursements drive AI adoption and are often necessary to show ROI.

Imaging AI Funding Still Solid in 2022

Despite plenty of challenges, imaging AI startups appear to be on pace for another solid funding year, helped by a handful of huge raises and a diverse mix of early-to-mid stage rounds.

So far in 2022 we’ve covered 18 AI funding events that totaled $615M, putting imaging AI startups roughly on pace for 2021’s record-high funding levels ($815M based on Signify’s analysis). Those funding rounds revealed a number of interesting trends:

  • The Big Getting Bigger – $442M of this year’s funding (72% of total) came from just four later-stage rounds: Aidoc ($110M), Viz.ai ($100M), Cleerly ($192M), and Qure.ai ($40M), as VCs increasingly bet on AI’s biggest players. 
  • Rounding Up the Rest – The remaining 14 companies raised a combined $173M (28% of total), with an even mix of Seed/Pre-Seed (4 rounds, $10.5M), Series A (5, $74M), and Series B (5, $89M) rounds. 
  • VCs Heart Cardiovascular AI – Cardiovascular AI startups captured a disproportionate share of VC funding, as Cleerly ($192M) was joined by Elucid ($27M) and Us2.ai ($15M). Considering that Circle CVI was recently acquired for $213M and HeartFlow has raised over $577M, cardiac AI startups seem to have become imaging AI’s valuation leaders (at least alongside diversified and care coordination AI vendors).
  • No H2 Drop-Off (yet) – The funding breakdown between Q1 (6 rounds, $63.5M), Q2 (7, $289M), and Q3 (5, $263M) doesn’t suggest that we’re in the middle of a second-half slowdown… even though we probably are. 

The Takeaway

Despite widespread AI consolidation chatter in Q1 and the emergence of economic headwinds by Q2, imaging AI startups are on pace for yet another massive funding year. These numbers don’t reveal how many otherwise-solid AI startups are struggling to secure their next funding round, and they don’t guarantee that funding will also be strong in 2023, but they do suggest that 2022’s AI funding won’t be nearly as bleak as some naysayers warned.

AI Crosses the Chasm

Despite plenty of challenges, Signify Research forecasts that the global imaging AI market will nearly quadruple by 2026, as AI “crosses the chasm” towards widespread adoption. Here’s how Signify sees that transition happening:

Market Growth – After generating global revenues of around $375M in 2020 and $400M and 2021, Signify expects the imaging AI market to maintain a massive 27.6% CAGR through 2026 when it reaches nearly $1.4B. 

Product-Led Growth – This growth will be partially driven by the availability of new and more-effective AI products, following:

  • An influx of new regulatory-approved solutions
  • Continued improvements to current products (e.g. adding triage to detection tools)
  • AI leaders expanding into new clinical segments
  • AI’s evolution from point solutions to comprehensive solutions/workflows
  • The continued adoption AI platforms/marketplaces

The Big Four – Imaging AI’s top four clinical segments (breast, cardiology, neurology, pulmonology) represented 87% of the AI market in 2021, and those segments will continue to dominate through 2026. 

VC Support – After investing $3.47B in AI startups between 2015 and 2021, Signify expects that VCs will remain a market growth driver, while their funding continues to shift toward later stage rounds. 

Remaining Barriers – AI still faces plenty of barriers, including limited reimbursements, insufficient economic/ROI evidence, stricter regulatory standards (especially in EU), and uncertain future prioritization from healthcare providers and imaging IT vendors. 

The Takeaway

2022 has been a tumultuous year for AI, bringing a number of notable achievements (increased adoption, improving products, new reimbursements, more clinical evidence, big funding rounds) that sometimes seemed to be overshadowed by AI’s challenges (difficult funding climate, market consolidation, slower adoption than previously hoped).  

However, Signify’s latest research suggests that 2022’s ups-and-downs might prove to be part of AI’s path towards mainstream adoption. And based on the steeper growth Signify forecasts for 2025-2026 (see chart above), the imaging AI market’s growth rate and overall value should become far greater after it finally “crosses the chasm.”

RevealDx & contextflow’s Lung CT Alliance

RevealDx and contextflow announced a new alliance that should advance the companies’ product and distribution strategies, and appears to highlight an interesting trend towards more comprehensive AI solutions.

The companies will integrate RevealDx’s RevealAI-Lung solution (lung nodule characterization) with contextflow’s SEARCH Lung CT software (lung nodule detection and quantification), creating a uniquely comprehensive lung cancer screening offering. 

contextflow will also become RevealDx’s exclusive distributor in Europe, adding to RevealDx’s global channel that includes a distribution alliance with Volpara (exclusive in Australia/NZ, non-exclusive in US) and a platform integration deal with Sirona

The alliance highlights contextflow’s new partner-driven strategy to expand SEARCH Lung CT beyond its image-based retrieval roots, coming just a few weeks after announcing an integration with Oxipit’s ChestEye Quality AI solution to identify missed lung nodules.

In fact, contextflow’s AI expansion efforts appear to be part of an emerging trend, as AI vendors work to support multiple steps within a given clinical activity (e.g. lung cancer assessments) or spot a wider range of pathologies in a given exam (e.g. CXRs):

  • Volpara has amassed a range of complementary breast cancer screening solutions, and has started to build out a similar suite of lung cancer screening solutions (including RevealDx & Riverain).
  • A growing field of chest X-ray AI vendors (Annalise.ai, Lunit, Qure.ai, Oxipit, Vuno) lead with their ability to detect multiple findings from a single CXR scan and AI workflow. 
  • Siemens Healthineers’ AI-RAD Companion Chest CT solution combines these two approaches, automating multiple diagnostic tasks (analysis, quantification, visualization, results generation) across a range of different chest CT exams and organs.

The Takeaway

contextflow and RevealDx’s European alliance seems to make a lot of sense, allowing contextflow to enhance its lung nodule detection/quantification findings with characterization details, while giving RevealDx the channel and lung nodule detection starting points that it likely needs.

The partnership also appears to represent another step towards more comprehensive and potentially more clinically valuable AI solutions, and away from the narrow applications that have dominated AI portfolios (and AI critiques) before now.

AI Experiences & Expectations

The European Society of Radiology just published new insights into how imaging AI is being used across Europe and how the region’s radiologists view this emerging technology.

The Survey – The ESR reached out to 27,700 European radiologists in January 2022 with a survey regarding their experiences and perspectives on imaging AI, receiving responses from just 690 rads.

Early Adopters – 276 the 690 respondents (40%) had clinical experience using imaging AI, with the majority of these AI users:

  • Working at academic and regional hospitals (52% & 37% – only 11% at practices)
  • Leveraging AI for interpretation support, case prioritization, and post-processing (51.5%, 40%, 28.6%)

AI Experiences – The radiologists who do use AI revealed a mix of positive and negative experiences:

  • Most found diagnostic AI’s output reliable (75.7%)
  • Few experienced technical difficulties integrating AI into their workflow (17.8%)
  • The majority found AI prioritization tools to be “very helpful” or “moderately helpful” for reducing staff workload (23.4% & 62.2%)
  • However, far fewer reported that diagnostic AI tools reduced staff workload (22.7% Yes, 69.8% No)

Adoption Barriers – Most coverage of this study will likely focus on the fact that only 92 of the surveyed rads (13.3%) plan to acquire AI in the future, while 363 don’t intend to acquire AI (52.6%). The radiologists who don’t plan to adopt AI (including those who’ve never used AI) based their opinions on:

  • AI’s lack of added value (44.4%)
  • AI not performing as well as advertised (26.4%)
  • AI adding too much work (22.9%)
  • And “no reason” (6.3%)

US Context – These results are in the same ballpark as the ACR’s 2020 US-based survey (33.5% using AI, only 20% of non-users planned to adopt within 5 years), although 2020 feels like a long time ago.

The Takeaway

Even if this ESR survey might leave you asking more questions (What about AI’s impact on patient care? How often is AI actually being used? How do opinions differ between AI users and non-users?), more than anything it confirms what many of us already know… We’re still very early in AI’s evolution, and there’s still plenty of performance and perception barriers that AI has to overcome.

Imaging AI’s Unseen Potential

Amid the dozens of imaging AI papers and presentations that came out over the last few weeks were three compelling new studies highlighting how much “unseen” information AI can extract from medical images, and the massive impact this information could have. 

Imaging-Led Population Health – An excellent presentation from Ayis Pyrros, MD placed radiology at the center of healthcare’s transition to value-based care and population health, highlighting the AI training opportunities that will come with more value-based care HCC codes and imaging AI’s untapped potential for early disease detection and management. Dr. Pyrros specifically emphasized chest X-ray’s potential given the exam’s ubiquity (26M Medicare CXRs in 2021), CXR AI’s ability to predict outcomes (e.g. mortality, comorbidities, hospital stays), and how opportunistic AI screening can/should support proactive care that benefits both patients and health systems.

  • Healthcare’s value-based overhaul has traditionally been seen as a threat to radiology’s fee-for-service foundations. Even if that might still be true from a business model perspective, Dr. Pyrros makes it quite clear that the shift to value-based care could make radiology even more important — and importance is always good for business.

AI Race Detection – The final peer-reviewed version of the landmark study showing that AI models can accurately predict patient race was officially published, further confirming that AI can detect patients’ self-reported race by analyzing medical image features. The new paper showed that AI very accurately detects patient race across modalities and anatomical regions (AUCs: CXRs 0.91 – 0.99, chest CT 0.89 – 0.96, mammography 0.81), without relying on proxies or imaging-related confounding features (BMI, disease distribution, and breast density all had ≤0.61 AUCs).

  • If imaging AI models intended for clinical tasks can identify patients’ races, they could be applying the same racial biomarkers to diagnosis, thus reproducing or exacerbating healthcare’s existing racial disparities. That’s an important takeaway whether you’re developing or adopting AI.

CXR Cost Predictions – The smart folks at the UCSF Center for Intelligent Imaging developed a series of CXR-based deep learning models that can predict patients’ future healthcare costs. Developed with 21,872 frontal CXRs from 19,524 patients, the best performing models were able to relatively accurately identify which patients would have a top-50% personal healthcare cost after one, three, and five years (AUCs: 0.806, 0.771, 0.729). 

  • Although predicting which patients will have higher costs could be useful on its own, these findings also suggest that similar CXR-based DL models could be used to flag patients who may deteriorate, initiate proactive care, or support healthcare cost analysis and policies.

The Case for Algorithmic Audits

A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

The Model – The team developed their proximal femoral fracture detection DL model using 45.7k frontal X-rays performed at Australia’s Royal Adelaide Hospital (w/ 4,861 fractures).

The Validation – They then tested it against a 4,577-exam internal set (w/ 640 fractures), 400 of which were also interpreted by five radiologists (w/ 200 fractures), and against an 81-image external validation set from Stanford.

The Results – All three tests produced results that a typical study might have viewed as evidence of high-performance: 

  • The model outperformed the five radiologists (0.994 vs. 0.969 AUCs)
  • It beat the best performing radiologist’s sensitivity (95.5% vs. 94.5%) and specificity (99.5% vs 97.5%)
  • It generalized well with the external Stanford data (0.980 AUC)

The Audit – Despite the strong results, a follow-up audit revealed that the model might make some predictions for the wrong reasons, suggesting that it is unsafe for clinical deployment:

  • One false negative X-ray included an extremely displaced fracture that human radiologists would catch
  • X-rays featuring abnormal bones or joints had a 50% false negative rate, far higher than the reader set’s overall false negative rate (2.5%)
  • Salience maps showed that AI decisions were almost never based on the outer region of the femoral neck, even with images where that region was clinically relevant (but it still often made the right diagnosis)
  • The model scored a high AUC with the Stanford data, but showed a substantial model operating point shift

The Case for Auditing – Although the study might have not started with this goal, it ended up becoming an argument for more sophisticated preclinical auditing. It even led to a separate paper outlining their algorithmic auditing process, which among other things suggested that AI users and developers should co-own audits.

The Takeaway

Auditing generally isn’t the most exciting topic in any field, but this study shows that it’s exceptionally important for imaging AI. It also suggests that audits might be necessary for achieving the most exciting parts of AI, like improving outcomes and efficiency, earning clinician trust, and increasing adoption.A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!