AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Predicting AI Performance

How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate. 

AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI

But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.

So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions. 

They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:

1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT 

Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding … 

  • AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
  • AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
  • Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario

How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.

The Takeaway

Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.  

POCUS Cuts DVT Stays

Using POCUS in the emergency department (ED) to scan patients with suspected deep vein thrombosis (DVT) cut their length of stay in the ED in half. 

Reducing hospital length of stay is one of the holy grails of healthcare quality improvement. 

  • It’s not only more expensive to keep patients in the hospital longer, but it can expose them to morbidities like hospital-acquired infections.

Patients admitted with suspected DVT often receive ultrasound scans performed by radiologists or sonographers to determine whether the blood clot is at risk of breaking off – a possibly fatal result. 

  • But this requires a referral to the radiology department. What if emergency physicians performed the scans themselves with POCUS?

To answer this question, researchers at this week’s European Emergency Medicine Conference presented results from a study of 93 patients at two hospitals in Finland.

  • From October 2017 to October 2019, patients presenting at the ED received POCUS scans from emergency doctors trained on the devices. 

Results were compared to 135 control patients who got usual care and were sent directly to radiology departments for ultrasound. 

  • Researchers found that POCUS reduced ED length of stay from 4.5 hours to 2.3 hours, a drop of 52%.

Researchers described the findings as “convincing,” especially as they occurred at two different facilities. The results also answer a recent study that found POCUS only affected length of stay when performed on the night shift. 

The Takeaway
Radiology might not be so happy to see patient referrals diverted from their department, but the results are yet another feather in the cap for POCUS, which continues to show that – when in the right hands – it can have a big impact on healthcare quality.

Radiology’s Enduring Popularity

Radiology is seeing a resurgence of interest from medical students picking the specialty in the National Resident Matching Program (NRMP). While radiology’s popularity is at historically high levels, the new analysis shows how vulnerable the field is to macro-economic trends in healthcare. 

Radiology’s popularity has always ebbed and flowed. In general the field is seen as one of the more attractive medical specialties due to the perception that it combines high salaries with lifestyle advantages. But there have been times when medical students shunned radiology.

The new paper offers insights into these trends. Published in Radiology by Francis Deng, MD, and Linda Moy, MD, the paper fleshes out an earlier analysis that Deng posted as a Twitter thread after the 2023 Match, showing that diagnostic radiology saw the highest growth in applicants to medical specialties over a three-year period.

Deng and Moy analyze trends in the Match over almost 25 years in the new study, finding…

  • The 2023 Match in radiology was the most competitive since 2001 based on percentage of applicants matching (81.1% vs. 73.3%)
  • 5.9% of seniors in US MD training programs applied to diagnostic radiology in the 2023 Match, the highest level since 2010
  • Fewer radiology residency slots per applicant were available in 2023 compared to the historical average (0.67 vs. 0.81) 

Interest in radiology hit its lowest levels in 1996 and 2015, when the number of applicants fell short of available radiology residency positions in the Match. It’s perhaps no surprise that these lows followed two major seismic healthcare shifts that could have negatively affected job prospects for radiologists: the “Hillarycare” healthcare reform effort in the early 1990s and the emergence of AI for healthcare in the mid-2010s. 

Hillarycare never happened, and Deng and Moy noted that outreach efforts to medical students about AI helped reverse the perspective that the technology would be taking radiologists’ jobs. Another advantage for radiology is its early adoption of teleradiology, which enables remote work and more flexible work options – a major lifestyle perk. 

The Takeaway

The new paper provides fascinating insights that support why radiology remains one of medicine’s most attractive specialties. Radiology’s appeal could even grow, given recent studies showing that work-life balance is a major priority for today’s medical students.

CT Detects Early Lung Cancer

A massive CT lung cancer screening program launched in Taiwan has been effective in detecting early lung cancer. Research presented at this week’s World Conference on Lung Cancer (WCLC) in Singapore offers more support for lung screening, which has seen the lowest uptake of the major population-based screening programs. 

Previous randomized clinical trials like the National Lung Screening Trial and the NELSON study have shown that LDCT lung cancer screening can reduce lung cancer mortality by at least 20%. But screening adherence rates remain low, ranging from the upper single digits to as high as 21% in a recent US study. 

Meanwhile, lung cancer remains the leading cause of cancer death worldwide. To reduce this burden, Taiwan in July 2022 launched the Lung Cancer Early Detection Program, which offers biennial screening nationwide to people at high risk of lung cancer.

The Taiwan program differs from screening programs in the US and South Korea by including family history of lung cancer in the eligibility criteria, rather than just focusing on people who smoke. 

Researchers at WCLC 2023 presented the first preliminary results from the program, covering almost 50k individuals screened from July 2022 to June 2023; 29k had a family history of lung cancer and 19k were people who smoked heavily. Researchers found …

  • 4.4k individuals receive a positive screening result for a positive rate of 9.2%
  • 531 people were diagnosed with lung cancer for a detection rate of 1.1%
  • 85% of cancers were diagnosed at an early stage, either stage 0 or stage 1

This last finding is perhaps the most significant, as part of the reason for lung cancer’s high mortality rate is that it’s often discovered at a late stage, when it’s far more difficult to treat. As such, lung cancer’s five-year survival rate is about 25% – far lower than breast cancer at 91%.

The Takeaway

Taiwan is setting an example to other countries for how to conduct a nationwide LDCT lung cancer screening program, even as some critics take aim at population-based screening. Taiwan’s approach is broader and more proactive than that of the US, for example, which has erected screening barriers like shared decision-making.

Although it’s still early days for the Taiwan program, future results will be examined closely to determine screening’s impact on lung cancer mortality – and respond to screening’s critics.

Tipping Point for Breast AI?

Have we reached a tipping point when it comes to AI for breast screening? This week another study was published – this one in Radiology – demonstrating the value of AI for interpreting screening mammograms. 

Of all the medical imaging exams, breast screening probably could use the most help. Reading mammograms has been compared to looking for a needle in a haystack, with radiologists reviewing thousands of images before finding a single cancer. 

AI could help in multiple ways, either at the radiologist’s side during interpretation or by reviewing mammograms in advance, triaging the ones most likely to be normal while reserving suspicious exams for closer attention by radiologists (indeed, that was the approach used in the MASAI study in Sweden in August).

In the new study, UK researchers in the PERFORMS trial compared the performance of Lunit’s INSIGHT MMG AI algorithm to that of 552 radiologists in 240 test mammogram cases, finding that …

  • AI was comparable to radiologists for sensitivity (91% vs. 90%, P=0.26) and specificity (77% vs. 76%, P=0.85). 
  • There was no statistically significant difference in AUC (0.93 vs. 0.88, P=0.15)
  • AI and radiologists were comparable or no different with other metrics

Like the MASAI trial, the PERFORMS results show that AI could play an important role in breast screening. To that end, a new paper in European Journal of Radiology proposes a roadmap for implementing mammography AI as part of single-reader breast screening programs, offering suggestions on prospective clinical trials that should take place to prove breast AI is ready for widespread use in the NHS – and beyond. 

The Takeaway

It certainly does seem that AI for breast screening has reached a tipping point. Taken together, PERFORMS and MASAI show that mammography AI works well enough that “the days of double reading are numbered,” at least where it is practiced in Europe, as noted in an editorial by Liane Philpotts, MD

While double-reading isn’t practiced in the US, the PERFORMS protocol could be used to supplement non-specialized radiologists who don’t see that many mammograms, Philpotts notes. Either way, AI looks poised to make a major impact in breast screening on both sides of the Atlantic.

Screening Foes Strike Back

Opponents of population-based cancer screening aren’t going away anytime soon. Just weeks after publication of a landmark study claiming that cancer screening has saved $7T over 25 years, screening foes published a counterattack in JAMA Internal Medicine casting doubt on whether screening has any value at all. 

Population-based cancer screening has been controversial since the first programs were launched decades ago. 

  • A vocal minority of skeptics continues to raise concerns about screening, despite the fact that mortality rates have dropped and survival rates have increased for the four cancers targeted by population screening.

This week’s JAMA Internal Medicine featured a series of articles that cast doubt on screening. In the main study, researchers performed a meta-analysis of 18 randomized clinical trials (RCTs) covering 2.1M people for six major screening tests, including mammography, CT lung cancer screening, and colon and PSA tests. 

  • The authors, led by Norwegian gastroenterologist Michael Bretthauer, MD, PhD, concluded that only flexible sigmoidoscopy for colon cancer produced a gain in lifetimes. They conclude that RCTs to date haven’t included enough patients who were followed over enough years to show screening has an effect on all-cause mortality.

But a deeper dive into the study produces interesting revelations. For CT lung cancer screening, Bretthauer et al didn’t include the landmark National Lung Screening Trial, an RCT that showed a 20% mortality reduction from screening.

  • With respect to breast imaging, the researchers only included three studies, even though there have been eight major mammography RCTs performed. And one of the three included was the controversial Canadian National Breast Screening Study, originally conducted in the 1980s.

When it comes to colon screening, Bretthauer included his own controversial 2022 NordICC study in his meta-analysis. 

  • The NordICC study found that if a person is invited to colon screening but doesn’t follow through, they don’t experience a mortality benefit. But those who actually got colon screening saw a 50% mortality reduction.  

Other articles in this week’s JAMA Internal Medicine series were penned by researchers well known for their opposition to population-based screening, including Gilbert Welch, MD, and Rita Redberg, MD.

The Takeaway

There’s an old saying in statistics: “If you torture the data long enough, it will confess to anything.” Among major academic journals, JAMA Internal Medicine – which Redberg guided for 14 years as editor until she stepped down in June – has consistently been the most hostile toward screening and new medical technology.

In the end, the arguments being made by screening’s foes would carry more weight if they were coming from researchers and journals that haven’t already demonstrated a longstanding, ingrained bias against population-based cancer screening.

Economic Barriers to AI

A new article in JACR highlights the economic barriers that are limiting wider adoption of AI in healthcare in the US. The study paints a picture of how the complex nature of Medicare reimbursement puts the country at risk of falling behind other nations in the quest to implement healthcare AI on a national scale. 

The success of any new medical technology in the US has always been linked to whether physicians can get reimbursed for using it. But there are a variety of paths to reimbursement in the Medicare system, each one with its own rules and idiosyncrasies. 

The establishment of the NTAP program was thought to be a milestone in paying for AI for inpatients, for example, but the JACR authors note that NTAP payments are time-limited for no more than three years. A variety of other factors are limiting AI reimbursement, including … 

  • All of the AI payments approved under the NTAP program have expired, and as such no AI algorithm is being reimbursed under NTAP 
  • Budget-neutral requirements in the Medicare Physician Fee Schedule mean that AI reimbursement is often a zero-sum game. Payments made for one service (such as AI) must be offset by reductions for something else 
  • Only one imaging AI algorithm has successfully navigated CMS to achieve Category I reimbursement in the Physician Fee Schedule, starting in 2024 for fractional flow reserve (FFR) analysis

Standing in stark contrast to the Medicare system is the NHS in the UK, where regulators see AI as an invaluable tool to address chronic workforce shortages in radiology and are taking aggressive action to promote its adoption. Not only has NHS announced a £21M fund to fuel AI adoption, but it is mulling the implementation of a national platform to enable AI algorithms to be accessed within standard radiology workflow. 

The Takeaway

The JACR article illustrates how Medicare’s Byzantine reimbursement structure puts barriers in the path of wider AI adoption. Although there have been some reimbursement victories such as NTAP, these have been temporary, and the fact that only one radiology AI algorithm has achieved a Category I CPT code must be a sobering thought to AI proponents.

Fine-Tuning Cardiac CT

CT has established itself as an excellent cardiac imaging modality. But there can still be some fine-tuning in terms of exactly how and when to use it, especially for assessing people presenting with chest pain. 

Two studies in JAMA Cardiology tackle this head-on, presenting new evidence that supports a more conservative – and precise – approach to determining which patients get follow-up testing. The studies also address concerns that using coronary CT angiography (CCTA) as an initial test before invasive catheterization could lead to unnecessary testing.

In the PRECISE study, researchers analyzed 2.1k patients from 2018 to 2021 who had stable symptoms of suspected coronary artery disease (CAD). Patients were randomized to a usual testing strategy (such as cardiac SPECT or stress echo), or a precision strategy that employed CCTA with selected fractional flow reserve CT (FFR-CT). 

The precision strategy group was further subdivided into a subgroup of those at minimal risk of cardiac events (20%) for whom testing was deferred to see if utilization could be reduced even further. In the precision strategy group….

  • Rates of invasive catheterization without coronary obstruction were lower (4% vs. 11%)
  • Testing was lower versus the usual testing group (84% vs. 94%)
  • Positive tests were more common (18% vs. 13%)
  • 64% of the deferred-testing subgroup got no testing at all
  • Adverse events were higher, but the difference was not statistically significant

To expand on the analysis, JAMA Cardiology published a related study that further investigated the safety of the deferred-testing strategy at one-year follow-up. Researchers compared adverse events in the deferred testing group to those who got the usual testing strategy, finding that the deferred testing group had…

  • A lower incidence rate of adverse events (0.9 vs. 5.9)
  • A lower rate of invasive cardiac cath without obstructive CAD per 100 patient years (1.0 vs. 6.5)

The results from both studies show that a strategy of deferring testing for low-risk CAD patients while sending higher-risk patients to CCTA and FFR-CT is clinically effective with no adverse impact on patient safety.

The Takeaway
The new findings don’t take any of the luster off cardiac CT; they simply add to the body of knowledge demonstrating when to use – and not to use – this incredibly powerful tool for directing patient care. And in the emerging era of precision medicine, that’s what it’s all about.

Radiation and Cancer Risk

New research on the cancer risk of low-dose ionizing radiation could have disturbing implications for those who are exposed to radiation on the job – including medical professionals. In a new study in BMJ, researchers found that nuclear workers exposed to occupational levels of radiation had a cancer mortality risk that was higher than previously estimated.

The link between low-dose radiation and cancer has long been controversial. Most studies on the radiation-cancer connection are based on Japanese atomic bomb survivors, many of whom were exposed to far higher levels of radiation than most people receive over their lifetimes – even those who work with ionizing radiation. 

The question is whether that data can be extrapolated to people exposed to much lower levels of radiation, such as nuclear workers, medical professionals, or even patients. To that end, researchers in the International Nuclear Workers Study (INWORKS) have been tracking low-dose radiation exposure and its connection to mortality in nearly 310k people in France, the UK, and the US who worked in the nuclear industry from 1944 to 2016.

INWORKS researchers previously published studies showing low-dose radiation exposure to be carcinogenic, but the new findings in BMJ offer an even stronger link. For the study, researchers tracked radiation exposure based on dosimetry badges worn by the workers and then rates of cancer mortality, and calculated rates of death from solid cancer based on their exposure levels, finding: 

  • Mortality risk was higher for solid cancers, at 52% per 1 Gy of exposure
  • Individuals who received the occupational radiation limit of 20 mSv per year would have a 5.2% increased solid cancer mortality rate over five years
  • There was a linear association between low-dose radiation exposure and cancer mortality, meaning that cancer mortality risk was also found at lower levels of exposure 
  • The dose-response association seen the study was even higher than in studies of atomic bomb survivors (52% vs. 32%)

The Takeaway

Even though the INWORKS study was conducted on nuclear workers rather than medical professionals, the findings could have implications for those who might be exposed to medical radiation, such as interventional radiologists and radiologic technologists. The study will undoubtedly be examined by radiation protection organizations and government regulators; the question is whether it leads to any changes in rules on occupational radiation exposure.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!