When AI Goes Wrong

What impact do incorrect AI results have on radiologist performance? That question was the focus of a new study in European Radiology in which radiologists who received incorrect AI results were more likely to make wrong decisions on patient follow-up – even though they would have been correct without AI’s help.

The accuracy of AI has become a major concern as deep learning models like ChatGPT become more powerful and come closer to routine use. There’s even a term – the “hallucination effect” – for when AI models veer off script to produce text that sounds plausible but in fact is incorrect.

While AI hallucinations may not be an issue in healthcare – yet – there is still concern about the impact that AI algorithms are having on clinicians, both in terms of diagnostic performance and workflow. 

To see what happens when AI goes wrong, researchers from Brown University sent 90 chest radiographs with “sham” AI results to six radiologists, with 50% of the studies positive for lung cancer. They employed different strategies for AI use, ranging from keeping the AI recommendations in the patient’s record to deleting them after the interpretation was made. Findings included:

  • When AI falsely called a true-pathology case “normal,” radiologists’ false-negative rates rose compared to when they didn’t use AI (20.7-33.0% depending on AI use strategy vs. 2.7%)
  • AI calling a negative case “abnormal” boosted radiologists’ false-positive rates compared to without AI (80.5-86.0% vs. 51.4%)
  • Not surprisingly, when AI calls were correct, radiologists were more accurate with AI than without, with increases in both true-positive rates (94.7-97.8% vs. 88.3%) and true-negative rates (89.7-90.7% vs. 77.3%)

Fortunately, the researchers offered suggestions on how to mitigate the impact of incorrect AI. Radiologists had fewer false negatives when AI provided a box around the region of suspicion, a phenomenon the researchers said could be related to AI helping radiologists focus. 

Also, radiologists’ false positives were higher when AI results were retained in the patient record versus when they were deleted. Researchers said this was evidence that radiologists were less likely to disagree with AI if there was a record of the disagreement occurring. 

The Takeaway 
As AI becomes more widespread clinically, studies like this will become increasingly important in shaping how the technology is used in the real world, and add to previous research on AI’s impact. Awareness that AI is imperfect – and strategies that take that awareness into account – will become key to any AI implementation.

Radiology Puts ChatGPT to Work

ChatGPT has taken the world by storm since the AI technology was first introduced in November 2022. In medicine, radiology is taking the lead in putting ChatGPT to work to address the specialty’s many efficiency and workflow challenges. 

Both ChatGPT and its newest iteration, GPT-4, are forms of AI known as large language models – essentially neural networks that are trained on massive volumes of unlabeled text and are able to learn on their own how to predict the structure and syntax of human language. 

A flood of papers have appeared in just the last week or so investigating ChatGPT’s potential:

  • ChatGPT could be used to improve patient engagement with radiology providers, such as by creating layperson reports that are more understandable, or by answering patient questions in a chatbot function, says an American Journal of Roentgenology article.
  • ChatGPT offered up accurate information about breast cancer prevention and screening to patients in a study in Radiology. But ChatGPT also gave some inappropriate and inconsistent recommendations – perhaps no surprise given that many experts themselves often disagree on breast screening guidelines.
  • ChatGPT was able to produce a report on a PET/CT scan of a patient – including technical terms like SUVmax and TNM stage – without special training, found researchers writing in Journal of Nuclear Medicine.
  • GPT-4 translated free-text radiology reports into structured reports that better lend themselves to standardization and data extraction for research in another paper published in Radiology. Best of all, the service cost 10 cents a report.

Where is all this headed? A review article on AI in medicine in New England Journal of Medicine gave the opinion – often stated in radiology – that AI has the potential to take over mundane tasks and give health professionals more time for human-to-human interactions. 

They compared the arrival of ChatGPT to the onset of digital imaging in radiology in the 1990s, and offered a tantalizing future in which chatbots like ChatGPT and GPT-4 replace outdated technologies like x-ray file rooms and lost images – remember those?

The Takeaway

Radiology’s embrace of ChatGPT and GPT-4 is heartening given the specialty’s initial skeptical response to AI in years past. As the most technologically advanced medical specialty, it’s only fitting that radiology takes the lead in putting this transformative technology to work – as it did with digital imaging.

ECR 2023 Bounces Back As AI Tops Clinical Program

The European Congress of Radiology is back. European radiologists returned to Vienna in force last week for ECR 2023, surprising many naysayers with crowded presentation rooms and exhibit booths.

Due to the COVID-19 pandemic, it was the first ECR meeting since 2019 to be held in the conference’s traditional timeframe of early March. And after a lightly attended ECR 2022, held during Europe’s July vacation season, many were watching with bated breath to see if the conference could mount a comeback. 

Fortunately, ECR 2023 didn’t disappoint. While attendance didn’t hit the high water mark set prior to the pandemic, it was strong enough to satisfy most that the show was indeed healthy, with chatter on-site placing attendance at around 17,000.

As with RSNA 2022, interest in AI was strong. AI-based content permeated the scientific sessions as well as the exhibit floor, and the show’s AI Theatre was packed for nearly every presentation. 

In his opening address, ECR 2023 President Dr. Adrian Brady of Ireland addressed concerns about AI’s impact on radiology in the years to come, characterizing it as one of the “winds of change” that should be embraced rather than shunned. 

Other major trends at ECR 2023 included: 

Patient Safety – Many sessions discussed how to reduce risk when scanning patients, ranging from lowering radiation dose to limiting the amount of contrast media to MRI scanning of patients with metallic implants.

Sustainability – Energy challenges have gripped the European continent since the Russian invasion of Ukraine in 2022, and imaging energy conservation was a key focus across several sessions. 

Workhorse Modalities – Unlike RSNA, where new product launches were focused on high-end premium systems, scanner introductions at ECR 2023 concentrated on workhorse offerings like mid-range CT and 1.5-tesla MRI.

The Takeaway

ECR is indeed back. It may not yet be a mandatory show for most U.S. radiologists, but it has regained its importance for anyone interested in a more global look at medical imaging. And given the European emphasis on research, it’s a great place to learn about new technologies before they appear in North America.

Acute Chest Pain CXR AI

Patients who arrive at the ED with acute chest pain (ACP) syndrome end up receiving a series of often-negative tests, but a new MGB-led study suggests that CXR AI might make ACP triage more accurate and efficient.

The researchers trained three ACP triage models using data from 23k MGH patients to predict acute coronary syndrome, pulmonary embolism, aortic dissection, and all-cause mortality within 30 days. 

  • Model 1: Patient age and sex
  • Model 2: Patient age, sex, and troponin or D-dimer positivity
  • Model 3: CXR AI predictions plus Model 2

In internal testing with 5.7k MGH patients, Model 3 predicted which patients would experience any of the ACP outcomes far more accurately than Models 2 and 1 (AUCs: 0.85 vs. 0.76 vs. 0.62), while maintaining performance across patient demographic groups.

  • At a 99% sensitivity threshold, Model 3 would have allowed 14% of the patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).

In external validation with 22.8k Brigham and Women’s patients, poor AI generalizability caused Model 3’s performance to drop dramatically, while Models 2 and 1 maintained their performance (AUCs: 0.77 vs. 0.76 vs. 0.64). However, fine-tuning with BWH’s own images significantly improved the performance of the CXR AI model (from 0.67 to 0.74 AUCs) and Model 3 (from 0.77 to 0.81 AUCs).

  • At a 99% sensitivity threshold, the fine-tuned Model 3 would have allowed 8% of BWH patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).

The Takeaway

Acute chest pain is among the most common reasons for ED visits, but it’s also a major driver of wasted ED time and resources. Considering that most ACP patients undergo CXR exams early in the triage process, this proof-of-concept study suggests that adding CXR AI could improve ACP diagnosis and significantly reduce downstream testing.

Federated Learning’s Glioblastoma Milestone

AI insiders celebrated a massive new study highlighting a federated learning AI model’s ability to delineate glioblastoma brain tumors with high accuracy and generalizability, while demonstrating FL’s potential value for rare diseases and underrepresented populations.

The UPenn-led research team went big, as the study’s 71 sites in 6 continents made it the largest FL project to-date, its 6,314 patients’ mpMRIs created the biggest glioblastoma (GBM) dataset ever, and its nearly 280 authors were the most we’ve seen in a published study. 

The researchers tested their final GBM FL consensus model twice – first using 20% of the “local” mpMRIs from each site that weren’t used in FL training, and second using 590 “out-of-sample” exams from 6 sites that didn’t participate in FL development.

These FL models achieved significant improvements compared to an AI model trained with public data for delineating the three main GBM tumor sub-compartments that are most relevant for treatment planning.

  • Surgically targetable tumor core: +33% w/ local, +27% w/ out-of-sample
  • Enhancing tumor: +27% w/ local, +15% w/ out-of-sample
  • Whole tumor: +16% w/ local, +16% w/ out-of-sample data

The Takeaway

Federated learning’s ability to improve AI’s performance in new settings/populations while maintaining patient data privacy has become well established in the last few years. However, this study takes FL’s resume to the next level given its unprecedented scope and the significant complexity associated with mpMRI glioblastoma exams, suggesting that FL will bring a “paradigm shift for multi-site collaborations.”

iCAD and Solis CVD Alliance

iCAD and major breast imaging center company Solis Mammography announced plans to develop and commercialize AI that quantifies breast arterial calcifications (BACs) in mammograms to identify women with high cardiovascular disease (CVD) risks.

Through the multi-year alliance, iCAD and Solis will expand upon iCAD’s flagship ProFound AI solution’s ability to detect and quantify BACs, with the goal of helping radiologists identify women with high CVD risks and guide them into care.

iCAD and Solis’ expansion into cardiovascular disease screening wasn’t exactly expected, but recent trends certainly suggest that commercial AI-based BAC detection could be on the way: 

  • There’s also mounting academic and commercial momentum behind using AI to “opportunistically” screen for incidental findings in scans that were performed for other reasons (e.g. analyzing CTs for CAC scores, osteoporosis, or lung nodules).
  • Despite being the leading cause of death in the US, it appears that we’re a long way from formal heart disease screening programs, making the already-established mammography screening pathway an unlikely alternative.
  • Volpara and Microsoft are also working on a mammography AI product that detects and quantifies BACs. In other words, three of the biggest companies in breast imaging (at least) and one of the biggest tech companies in the world are all currently developing AI-based BAC screening solutions.

The Takeaway

Widespread adoption of mammography AI-based cardiovascular disease screening might seem like a longshot to many readers who often view incidentals as a burden and have grown weary of early-stage AI announcements… and they might be right. That said, there’s plenty of evidence suggesting that a solution like this would help detect more early-stage heart disease using scans that are already being performed.

Prioritizing Length of Stay

A new study out of Cedars Sinai provided what might be the strongest evidence yet that imaging AI triage and prioritization tools can shorten inpatient hospitalizations, potentially bolstering AI’s economic and patient care value propositions outside of the radiology department.

The researchers analyzed patient length of stay (LOS) before and after Cedars Sinai adopted Aidoc’s triage AI solutions for intracranial hemorrhage (Nov 2017) and pulmonary embolism (Dec 2018), using 2016-2019 data from all inpatients who received noncontrast head CTs or chest CTAs.

  • ICH Results – Among Cedars Sinai’s 1,718 ICH patients (795 after ICH AI adoption), average LOS dropped by 11.9% from 10.92 to 9.62 days (vs. -5% for other head CT patients).
  • PE Results – Among Cedars Sinai’s 400 patients diagnosed with PE (170 after PE AI adoption), average LOS dropped by a massive 26.3% from 7.91 to 5.83 days (vs. +5.2% for other CCTA patients). 
  • Control Results – Control group patients with hip fractures saw smaller LOS decreases during the respective post-AI periods (-3% & -8.3%), while hospital-wide LOS seemed to trend upward (-2.5% & +10%).

The Takeaway

These results were strong enough for the authors to conclude that Cedars Sinai’s LOS improvements were likely “due to the triage software implementation.” 

Perhaps more importantly, some could also interpret these LOS reductions as evidence that Cedars Sinai’s triage AI adoption also improved its overall patient care and inpatient operating costs, given how these LOS reductions were likely achieved (faster diagnosis & treatment), the typical associations between hospital long stays and negative outcomes, and the fact that inpatient stays have a significant impact on hospital costs.

Prostate MR AI’s Experience Boost

A new European Radiology study showed that Siemens Healthineers’ AI-RAD Companion Prostate MR solution can improve radiologists’ lesion assessment accuracy (especially less-experienced rads), while reducing reading times and lesion grading variability. 

The researchers had four radiologists (two experienced, two inexperienced) assess lesions in 172 prostate MRI exams, with and without AI support, finding that AI-RAD Companion Prostate MR improved:

  • The less-experienced radiologists’ performance, significantly (AUCs: 0.66 to 0.80 & 0.68 to 0.80)
  • The experienced rads’ performance, modestly (AUCs: 0.81 to 0.86 & 0.81 to 0.84)
  • Overall PI-RADS category and Gleason score correlations (r = 0.45 to 0.57)
  • Median reading times (157 to 150 seconds)

The study also highlights Siemens Healthineers’ emergence as an AI research leader, leveraging its relationship / funding advantages over AI-only vendors and its (potentially) greater focus on AI research than its OEM peers to become one of imaging AI’s most-published vendors (here are some of its other recent studies).

The Takeaway

Given the role that experience plays in radiologists’ prostate MRI accuracy, and noting prostate MRI’s historical challenges with variability, this study makes a solid case for AI-RAD Companion Prostate MR’s ability to improve rads’ diagnostic performance (without slowing them down). It’s also a reminder that Siemens Healthineers is serious about supporting its homegrown AI portfolio through academic research.

RevealDx & contextflow’s Lung CT Alliance

RevealDx and contextflow announced a new alliance that should advance the companies’ product and distribution strategies, and appears to highlight an interesting trend towards more comprehensive AI solutions.

The companies will integrate RevealDx’s RevealAI-Lung solution (lung nodule characterization) with contextflow’s SEARCH Lung CT software (lung nodule detection and quantification), creating a uniquely comprehensive lung cancer screening offering. 

contextflow will also become RevealDx’s exclusive distributor in Europe, adding to RevealDx’s global channel that includes a distribution alliance with Volpara (exclusive in Australia/NZ, non-exclusive in US) and a platform integration deal with Sirona

The alliance highlights contextflow’s new partner-driven strategy to expand SEARCH Lung CT beyond its image-based retrieval roots, coming just a few weeks after announcing an integration with Oxipit’s ChestEye Quality AI solution to identify missed lung nodules.

In fact, contextflow’s AI expansion efforts appear to be part of an emerging trend, as AI vendors work to support multiple steps within a given clinical activity (e.g. lung cancer assessments) or spot a wider range of pathologies in a given exam (e.g. CXRs):

  • Volpara has amassed a range of complementary breast cancer screening solutions, and has started to build out a similar suite of lung cancer screening solutions (including RevealDx & Riverain).
  • A growing field of chest X-ray AI vendors (Annalise.ai, Lunit, Qure.ai, Oxipit, Vuno) lead with their ability to detect multiple findings from a single CXR scan and AI workflow. 
  • Siemens Healthineers’ AI-RAD Companion Chest CT solution combines these two approaches, automating multiple diagnostic tasks (analysis, quantification, visualization, results generation) across a range of different chest CT exams and organs.

The Takeaway

contextflow and RevealDx’s European alliance seems to make a lot of sense, allowing contextflow to enhance its lung nodule detection/quantification findings with characterization details, while giving RevealDx the channel and lung nodule detection starting points that it likely needs.

The partnership also appears to represent another step towards more comprehensive and potentially more clinically valuable AI solutions, and away from the narrow applications that have dominated AI portfolios (and AI critiques) before now.

Cathay’s AI Underwriting

Cathay Life Insurance will use Lunit’s INSIGHT CXR AI solution to identify abnormalities in its applicants’ chest X-rays, potentially modernizing a manual underwriting process and uncovering a new non-clinical market for AI vendors.

Lunit INSIGHT CXR will be integrated into Cathay’s underwriting workflow, with the goals of enhancing its radiologists’ accuracy and efficiency, while improving Cathay’s underwriting decisions. 

Lunit and Cathay have reason to be optimistic about this endeavor, given that their initial proof of concept study found that INSIGHT CXR:

  • Improved Cathay’s radiologists’ reading accuracy by 20%
  • Reduced the radiologists’ overall reading time by up to 90%

Those improvements could have a significant labor impact, considering that Cathay’s rads review 30,000 CXRs every year. They might have an even greater business impact, noting the important role that underwriting accuracy has on policy profitability.

Lunit’s part of the announcement largely focused on its expansion beyond clinical settings, revealing plans to “become the driving force of digital innovation in the global insurance market” and to further expand its business into “various sectors outside the hospital setting.”

The Takeaway

Even if life insurers only require CXRs for a small percentage of their applicants (older people, higher value policies), they still review hundreds of thousands of CXRs each year. That makes insurers an intriguing new market segment for AI vendors, and makes you wonder what other non-clinical AI use cases might exist. However, it might also make radiologists who are still skeptical about AI concerned.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!