AI Tug of War Continues

The ongoing tug of war over AI’s value to radiology continues. This time the rope has moved in AI’s favor with publication of a new study in JAMA Network Open that shows the potential of a new type of AI language model for creating radiology reports.

  • Headlines about AI have ping-ponged in recent weeks, from positive studies like MASAI and PERFORMS to more equivocal trials like a chest X-ray study in Radiology and news from the UK that healthcare authorities may not be ready for chest X-ray AI’s full clinical roll-out. 

In the new paper, Northwestern University researchers tested a chest X-ray AI algorithm they developed with a transformer technique, a type of generative AI language model that can both analyze images and generate radiology text as output. 

  • Transformer language models show promise due to their ability to combine both image and non-image data, as researchers showed in a paper last week.

The Northwestern researchers tested their transformer model in 500 chest radiographs of patients evaluated overnight in the emergency department from January 2022 to January 2023. 

Reports generated by AI were then compared to reports from a teleradiologist as well as the final report by an in-house radiologist, which was set as the gold standard. The researchers found that AI-generated reports …

  • Had sensitivity a bit lower than teleradiology reports (85% vs. 92%)
  • Had specificity a bit higher (99% vs. 97%)
  • In some cases improved on the in-house radiology report by detecting subtle abnormalities missed by the radiologist

Generative AI language models like the Northwestern algorithm could perform better than algorithms that rely on a classification approach to predicting the presence of pathology. Such models limit medical diagnoses to yes/no predictions that may omit context that’s relevant to clinical care, the researchers believe. 

In real-world clinical use, the Northwestern team thinks their model could assist emergency physicians in circumstances where in-house radiologists or teleradiologists aren’t immediately available, helping triage emergent cases.

The Takeaway

After the negative headlines of the last few weeks, it’s good to see positive news about AI again. Although the current study is relatively small and much larger trials are needed, the Northwestern research has promising implications for the future of transformer-based AI language models in radiology.

More Work Ahead for Chest X-Ray AI?

In another blow to radiology AI, the UK’s national technology assessment agency issued an equivocal report on AI for chest X-ray, stating that more research is needed before the technology can enter routine clinical use.

The report came from the National Institute for Health and Care Excellence (NICE), which assesses new health technologies that have the potential to address unmet NHS needs. 

The NHS sees AI as a potential solution to its challenge of meeting rising demand for imaging services, a dynamic that’s leading to long wait times for exams

But at least some corners of the UK health establishment have concerns about whether AI for chest X-ray is ready for prime time. 

  • The NICE report states that – despite the unmet need for quicker chest X-ray reporting – there is insufficient evidence to support the technology, and as such it’s not possible to assess its clinical and cost benefits. And it said there is “no evidence” on the accuracy of AI-assisted clinician review compared to clinicians working alone.

As such, the use of AI for chest X-ray in the NHS should be limited to research, with the following additional recommendations …

  • Centers already using AI software to review chest X-rays may continue to do so, but only as part of an evaluation framework and alongside clinician review
  • Purchase of chest X-ray AI software should be made through corporate, research, or non-core NHS funding
  • More research is needed on AI’s impact on a number of outcomes, such as CT referrals, healthcare costs and resource use, review and reporting time, and diagnostic accuracy when used alongside clinician review

The NICE report listed 14 commercially available chest X-ray algorithms that need more research, and it recommended prospective studies to address gaps in evidence. AI developers will be responsible for performing these studies.

The Takeaway

Taken with last week’s disappointing news on AI for radiology, the NICE report is a wakeup call for what had been one of the most promising clinical use cases for AI. The NHS had been seen as a leader in spearheading clinical adoption of AI; for chest X-ray, clinicians in the UK may have to wait just a bit longer.

AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Predicting AI Performance

How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate. 

AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI

But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.

So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions. 

They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:

1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT 

Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding … 

  • AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
  • AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
  • Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario

How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.

The Takeaway

Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.  

Can AI Direct Breast MRI?

A deep learning algorithm trained to analyze mammography images did a better job than traditional risk models in predicting breast cancer risk. The study shows the AI model could direct the use of supplemental screening breast MRI for women who need it most. 

Breast MRI has emerged (along with ultrasound) as one of the most effective imaging modalities to supplement conventional X-ray-based mammography. Breast MRI performs well regardless of breast tissue density, and can even be used for screening younger high-risk women for whom radiation is a concern. 

But there are also disadvantages to breast MRI. It’s expensive and time-consuming, and clinicians aren’t always sure which women should get it. As a result, breast MRI is used too often in women at average risk and not often enough in those at high risk. 

In the current study in Radiology, researchers from MGH compared the Mirai deep learning algorithm to conventional risk-prediction models. Mirai was developed at MIT to predict five-year breast cancer risk, and the first papers on the model emerged in 2019; previous studies have already demonstrated the algorithm’s prowess for risk prediction

Mirai was used to analyze mammograms and develop risk scores for 2.2k women who also received 4.2k screening breast MRI exams from 2017-2020 at four facilities. Researchers then compared the performance of the algorithm to traditional risk tools like Tyrer-Cuzick and NCI’s Breast Cancer Risk Assessment (BCRAT), finding that … 

  • In women Mirai identified as high risk, the cancer detection rate per 1k on breast MRI was far higher compared to those classified as high risk by Tyrer-Cuzick and BCRAT (20.6 vs. 6.0 & 6.8)
  • Mirai had a higher PPV for predicting abnormal findings on breast MRI screening (14.6% vs. 5.0% & 5.5%)
  • Mirai scored higher in PPV of biopsies recommended (32.4% vs. 12.7% & 11.1%) and PPV for biopsies performed (36.4% vs. 13.5% & 12.5%)

The Takeaway
Breast imaging has become one of the AI use cases with the most potential, based on recent studies like PERFORMS and MASAI, and the new study shows Mirai could be useful in directing women to breast MRI screening. Like the previous studies, the current research is pointing to a near-term future in which AI and deep learning can make breast screening more accurate and cost-effective than it’s ever been before. 

Tipping Point for Breast AI?

Have we reached a tipping point when it comes to AI for breast screening? This week another study was published – this one in Radiology – demonstrating the value of AI for interpreting screening mammograms. 

Of all the medical imaging exams, breast screening probably could use the most help. Reading mammograms has been compared to looking for a needle in a haystack, with radiologists reviewing thousands of images before finding a single cancer. 

AI could help in multiple ways, either at the radiologist’s side during interpretation or by reviewing mammograms in advance, triaging the ones most likely to be normal while reserving suspicious exams for closer attention by radiologists (indeed, that was the approach used in the MASAI study in Sweden in August).

In the new study, UK researchers in the PERFORMS trial compared the performance of Lunit’s INSIGHT MMG AI algorithm to that of 552 radiologists in 240 test mammogram cases, finding that …

  • AI was comparable to radiologists for sensitivity (91% vs. 90%, P=0.26) and specificity (77% vs. 76%, P=0.85). 
  • There was no statistically significant difference in AUC (0.93 vs. 0.88, P=0.15)
  • AI and radiologists were comparable or no different with other metrics

Like the MASAI trial, the PERFORMS results show that AI could play an important role in breast screening. To that end, a new paper in European Journal of Radiology proposes a roadmap for implementing mammography AI as part of single-reader breast screening programs, offering suggestions on prospective clinical trials that should take place to prove breast AI is ready for widespread use in the NHS – and beyond. 

The Takeaway

It certainly does seem that AI for breast screening has reached a tipping point. Taken together, PERFORMS and MASAI show that mammography AI works well enough that “the days of double reading are numbered,” at least where it is practiced in Europe, as noted in an editorial by Liane Philpotts, MD

While double-reading isn’t practiced in the US, the PERFORMS protocol could be used to supplement non-specialized radiologists who don’t see that many mammograms, Philpotts notes. Either way, AI looks poised to make a major impact in breast screening on both sides of the Atlantic.

Economic Barriers to AI

A new article in JACR highlights the economic barriers that are limiting wider adoption of AI in healthcare in the US. The study paints a picture of how the complex nature of Medicare reimbursement puts the country at risk of falling behind other nations in the quest to implement healthcare AI on a national scale. 

The success of any new medical technology in the US has always been linked to whether physicians can get reimbursed for using it. But there are a variety of paths to reimbursement in the Medicare system, each one with its own rules and idiosyncrasies. 

The establishment of the NTAP program was thought to be a milestone in paying for AI for inpatients, for example, but the JACR authors note that NTAP payments are time-limited for no more than three years. A variety of other factors are limiting AI reimbursement, including … 

  • All of the AI payments approved under the NTAP program have expired, and as such no AI algorithm is being reimbursed under NTAP 
  • Budget-neutral requirements in the Medicare Physician Fee Schedule mean that AI reimbursement is often a zero-sum game. Payments made for one service (such as AI) must be offset by reductions for something else 
  • Only one imaging AI algorithm has successfully navigated CMS to achieve Category I reimbursement in the Physician Fee Schedule, starting in 2024 for fractional flow reserve (FFR) analysis

Standing in stark contrast to the Medicare system is the NHS in the UK, where regulators see AI as an invaluable tool to address chronic workforce shortages in radiology and are taking aggressive action to promote its adoption. Not only has NHS announced a £21M fund to fuel AI adoption, but it is mulling the implementation of a national platform to enable AI algorithms to be accessed within standard radiology workflow. 

The Takeaway

The JACR article illustrates how Medicare’s Byzantine reimbursement structure puts barriers in the path of wider AI adoption. Although there have been some reimbursement victories such as NTAP, these have been temporary, and the fact that only one radiology AI algorithm has achieved a Category I CPT code must be a sobering thought to AI proponents.

Radiation and Cancer Risk

New research on the cancer risk of low-dose ionizing radiation could have disturbing implications for those who are exposed to radiation on the job – including medical professionals. In a new study in BMJ, researchers found that nuclear workers exposed to occupational levels of radiation had a cancer mortality risk that was higher than previously estimated.

The link between low-dose radiation and cancer has long been controversial. Most studies on the radiation-cancer connection are based on Japanese atomic bomb survivors, many of whom were exposed to far higher levels of radiation than most people receive over their lifetimes – even those who work with ionizing radiation. 

The question is whether that data can be extrapolated to people exposed to much lower levels of radiation, such as nuclear workers, medical professionals, or even patients. To that end, researchers in the International Nuclear Workers Study (INWORKS) have been tracking low-dose radiation exposure and its connection to mortality in nearly 310k people in France, the UK, and the US who worked in the nuclear industry from 1944 to 2016.

INWORKS researchers previously published studies showing low-dose radiation exposure to be carcinogenic, but the new findings in BMJ offer an even stronger link. For the study, researchers tracked radiation exposure based on dosimetry badges worn by the workers and then rates of cancer mortality, and calculated rates of death from solid cancer based on their exposure levels, finding: 

  • Mortality risk was higher for solid cancers, at 52% per 1 Gy of exposure
  • Individuals who received the occupational radiation limit of 20 mSv per year would have a 5.2% increased solid cancer mortality rate over five years
  • There was a linear association between low-dose radiation exposure and cancer mortality, meaning that cancer mortality risk was also found at lower levels of exposure 
  • The dose-response association seen the study was even higher than in studies of atomic bomb survivors (52% vs. 32%)

The Takeaway

Even though the INWORKS study was conducted on nuclear workers rather than medical professionals, the findings could have implications for those who might be exposed to medical radiation, such as interventional radiologists and radiologic technologists. The study will undoubtedly be examined by radiation protection organizations and government regulators; the question is whether it leads to any changes in rules on occupational radiation exposure.

How Vendors Sell AI

Better patient care is the main selling point used by AI vendors when marketing neuroimaging algorithms, followed closely by time savings. Farther down the list of benefits are lower costs and increased revenue for providers. 

So says a new analysis in JACR that takes a close look at how FDA-cleared neuroimaging AI algorithms are marketed by vendors. It also includes several warning signs for both AI developers and clinicians.

AI is the most exciting technology to arrive in healthcare in decades, but questions percolate on whether AI developers are overhyping the technology. In the new analysis, researchers focused on marketing claims made for 59 AI neuroimaging algorithms cleared by the FDA from 2008 to 2022. Researchers analyzed FDA summaries and vendor websites, finding:

  • For 69% of algorithms, vendors highlighted an improvement in quality of patient care, while time savings for clinicians were touted for 44%. Only 16% of algorithms were promoted as lowering costs, while just 11% were positioned as increasing revenue
  • 50% of cleared neuroimaging algorithms were related to detection or quantification of stroke; of these, 41% were for intracranial hemorrhage, 31% for stroke brain perfusion, and 24% for detection of large vessel occlusion 
  • 41% of the algorithms were intended for use with non-contrast CT scans, 36% with MRI, 15% with CT perfusion, 14% with CT angiography, and the rest with MR perfusion and PET
  • 90% of the algorithms studied were cleared in the last five years, and 42% since last year

The researchers further noted two caveats in AI marketing: 

  • There is a lack of publicly available data to support vendor claims about the value of their algorithms. Better transparency is needed to create trust and clinician engagement.
  • The single-use-case nature of many AI algorithms raises questions about their economic viability. Many different algorithms would have to be implemented at a facility to ensure “a reasonable breadth of triage” for critical findings, and the financial burden of such integration is unclear.

The Takeaway

The new study offers intriguing insights into how AI algorithms are marketed by vendors, and how these efforts could be perceived by clinicians. The researchers note that financial pressure on AI developers may cause them to make “unintentional exaggerated claims” to recoup the cost of development; it is incumbent upon vendors to scrutinize their marketing activities to avoid overhyping AI technology.

Grading AI Report Quality

One of the most exciting new use cases for medical AI is in generating radiology reports. But how can you tell whether the quality of a report generated by an AI algorithm is comparable to that of a radiologist?

In a new study in Patterns, researchers propose a technical framework for automatically grading the output of AI-generated radiology reports, with the ultimate goal of producing AI-generated reports that are indistinguishable from those of radiologists. 

Most radiology AI applications so far have focused on developing algorithms to identify individual pathologies on imaging exams. 

  • While this is useful, helping radiologists streamline the production of their main output – the radiology report – could have a far greater impact on their productivity and efficiency. 

But existing tools for measuring the quality of AI-generated narrative reports are limited and don’t match up well with radiologists’ evaluations. 

  • To improve that situation, the researchers applied several existing automated metrics for analyzing report quality and compared them to the scores of radiologists, seeking to better understand AI’s weaknesses. 

Not surprisingly, the automated metrics fell short in several ways, including false prediction of findings, omitting findings, and incorrectly locating and predicting the severity of findings. 

  • These shortcomings point out the need for better scoring systems for gauging AI performance. 

The researchers therefore proposed a new metric for grading AI-generated report quality, called RadGraph F1, and a new methodology, RadCliQ, to predict how well an AI report would measure up to radiologist scrutiny. 

  • RadGraph F1 and RadCliQ could be used in future research on AI-generated radiology reports, and to that end the researchers have made the code for both metrics available as open source.

Ultimately, the researchers see the construction of generalist medical AI models that could perform multiple complex tasks, such as conversing with radiologists and physicians about medical images. 

  • Another use case could be applications that are able to explain imaging findings to patients in everyday language. 

The Takeaway

It’s a complex and detailed paper, but the new study is important because it outlines the metrics that can be used to teach machines how to generate better radiology reports. Given the imperative to improve radiologist productivity in the face of rising imaging volume and workforce shortages, this could be one more step on the quest for the Holy Grail of AI in radiology.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!