More Work Ahead for Chest X-Ray AI?

In another blow to radiology AI, the UK’s national technology assessment agency issued an equivocal report on AI for chest X-ray, stating that more research is needed before the technology can enter routine clinical use.

The report came from the National Institute for Health and Care Excellence (NICE), which assesses new health technologies that have the potential to address unmet NHS needs. 

The NHS sees AI as a potential solution to its challenge of meeting rising demand for imaging services, a dynamic that’s leading to long wait times for exams

But at least some corners of the UK health establishment have concerns about whether AI for chest X-ray is ready for prime time. 

  • The NICE report states that – despite the unmet need for quicker chest X-ray reporting – there is insufficient evidence to support the technology, and as such it’s not possible to assess its clinical and cost benefits. And it said there is “no evidence” on the accuracy of AI-assisted clinician review compared to clinicians working alone.

As such, the use of AI for chest X-ray in the NHS should be limited to research, with the following additional recommendations …

  • Centers already using AI software to review chest X-rays may continue to do so, but only as part of an evaluation framework and alongside clinician review
  • Purchase of chest X-ray AI software should be made through corporate, research, or non-core NHS funding
  • More research is needed on AI’s impact on a number of outcomes, such as CT referrals, healthcare costs and resource use, review and reporting time, and diagnostic accuracy when used alongside clinician review

The NICE report listed 14 commercially available chest X-ray algorithms that need more research, and it recommended prospective studies to address gaps in evidence. AI developers will be responsible for performing these studies.

The Takeaway

Taken with last week’s disappointing news on AI for radiology, the NICE report is a wakeup call for what had been one of the most promising clinical use cases for AI. The NHS had been seen as a leader in spearheading clinical adoption of AI; for chest X-ray, clinicians in the UK may have to wait just a bit longer.

AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!