Radiology Puts ChatGPT to Work

ChatGPT has taken the world by storm since the AI technology was first introduced in November 2022. In medicine, radiology is taking the lead in putting ChatGPT to work to address the specialty’s many efficiency and workflow challenges. 

Both ChatGPT and its newest iteration, GPT-4, are forms of AI known as large language models – essentially neural networks that are trained on massive volumes of unlabeled text and are able to learn on their own how to predict the structure and syntax of human language. 

A flood of papers have appeared in just the last week or so investigating ChatGPT’s potential:

  • ChatGPT could be used to improve patient engagement with radiology providers, such as by creating layperson reports that are more understandable, or by answering patient questions in a chatbot function, says an American Journal of Roentgenology article.
  • ChatGPT offered up accurate information about breast cancer prevention and screening to patients in a study in Radiology. But ChatGPT also gave some inappropriate and inconsistent recommendations – perhaps no surprise given that many experts themselves often disagree on breast screening guidelines.
  • ChatGPT was able to produce a report on a PET/CT scan of a patient – including technical terms like SUVmax and TNM stage – without special training, found researchers writing in Journal of Nuclear Medicine.
  • GPT-4 translated free-text radiology reports into structured reports that better lend themselves to standardization and data extraction for research in another paper published in Radiology. Best of all, the service cost 10 cents a report.

Where is all this headed? A review article on AI in medicine in New England Journal of Medicine gave the opinion – often stated in radiology – that AI has the potential to take over mundane tasks and give health professionals more time for human-to-human interactions. 

They compared the arrival of ChatGPT to the onset of digital imaging in radiology in the 1990s, and offered a tantalizing future in which chatbots like ChatGPT and GPT-4 replace outdated technologies like x-ray file rooms and lost images – remember those?

The Takeaway

Radiology’s embrace of ChatGPT and GPT-4 is heartening given the specialty’s initial skeptical response to AI in years past. As the most technologically advanced medical specialty, it’s only fitting that radiology takes the lead in putting this transformative technology to work – as it did with digital imaging.

Understanding AI’s Physician Influence

We spend a lot of time exploring the technical aspects of imaging AI performance, but little is known about how physicians are actually influenced by the AI findings they receive. A new Scientific Reports study addresses that knowledge gap, perhaps more directly than any other research to date. 

The researchers provided 233 radiologists (experts) and internal and emergency medicine physicians (non-experts) with eight chest X-ray cases each. The CXR cases featured correct diagnostic advice, but were manipulated to show different advice sources (generated by AI vs. by expert rads) and different levels of advice explanations (only advice vs. advice w/ visual annotated explanations). Here’s what they found…

  • Explanations Improve Accuracy – When the diagnostic advice included annotated explanations, both the IM/EM physicians and radiologists’ accuracy improved (+5.66% & +3.41%).
  • Non-Rads with Explainable Advice Rival Rads – Although the IM/EM physicians performed far worse than rads when given advice without explanations, they were “on par with” radiologists when their advice included explainable annotations (see Fig 3).
  • Explanations Help Radiologists with Tough Cases – Radiologists gained “limited benefit” from advice explanations with most of the X-ray cases, but the explanations significantly improved their performance with the single most difficult case.
  • Presumed AI Use Improves Accuracy – When advice was labeled as AI-generated (vs. rad-generated), accuracy improved for both the IM/EM physicians and radiologists (+4.22% & +3.15%).
  • Presumed AI Use Improves Expert Confidence – When advice was labeled as AI-generated (vs. rad-generated), radiologists were more confident in their diagnosis.

The Takeaway
This study provides solid evidence supporting the use of visual explanations, and bolsters the increasingly popular theory that AI can have the greatest impact on non-experts. It also revealed that physicians trust AI more than some might have expected, to the point where physicians who believed they were using AI made more accurate diagnoses than they would have if they were told the same advice came from a human expert.

However, more than anything else, this study seems to highlight the underappreciated impact of product design on AI’s clinical performance.

CXR AI’s Screening Generalizability Gap

A new European Radiology study detailed a commercial CXR AI tool’s challenges when used for screening patients with low disease prevalence, bringing more attention to the mismatch between how some AI tools are trained and how they’re applied in the real world.

The researchers used an unnamed commercial AI tool to detect abnormalities in 3k screening CXRs sourced from two healthcare centers (2.2% w/ clinically significant lesions), and had four radiology residents read the same CXRs with and without AI assistance, finding that the AI:

  • Produced a far lower AUROC than in its other studies (0.648 vs. 0.77–0.99)
  • Achieved 94.2% specificity, but just 35.3% sensitivity
  • Detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumors 
  • Only “modestly” improved the residents’ AUROCs (0.571–0.688 vs. 0.534–0.676)
  • Added 2.96 to 10.27 seconds to the residents’ average CXR reading times

The researchers attributed the AI tool’s “poorer than expected” performance to differences between the data used in its initial training and validation (high disease prevalence) and the study’s clinical setting (high-volume, low-prevalence, screening).

  • More notably, the authors pointed to these results as evidence that many commercial AI products “may not directly translate to real-world practice,” urging providers facing this kind of training mismatch to retrain their AI or change their thresholds, and calling for more rigorous AI testing and trials.

These results also inspired lively online discussions. Some commenters cited the study as proof of the problems caused by training AI with augmented datasets, while others contended that the AI tool’s AUROC still rivaled the residents and its “decent” specificity is promising for screening use.

The Takeaway

We cover plenty of studies about AI generalizability, but most have explored bias due to patient geography and demographics, rather than disease prevalence mismatches. Even if AI vendors and researchers are already aware of this issue, AI users and study authors might not be, placing more emphasis on how vendors position their AI products for different use cases (or how they train it).

Guerbet’s Big AI Investment

Guerbet took a big step towards advancing its AI strategy, acquiring a 39% stake in French imaging software company Intrasense, and revealing ambitious future plans for their combined technologies.

Through Intrasense, Guerbet gains access to a visualization and AI platform and a team of AI integration experts to help bring its algorithms into clinical use. The tie-up could also create future platform and algorithm development opportunities, and the expansion of their technologies across Guerbet’s global installed base.

The €8.8M investment (€0.44/share, a 34% premium) could turn into a €22.5M acquisition, as Guerbet plans to file a voluntary tender offer for all remaining shares.

Even though Guerbet is a €700M company and Intrasense is relatively small (~€3.8M 2022 revenue, 67 employees on LinkedIn), this seems like a significant move given and Guerbet’s increasing emphasis on AI:

What Guerbet was lacking before now (especially since ending its Merative/IBM alliance) was a future AI platform – and Intrasense should help fill that void. 

If Guerbet acquires Intrasense it would continue the recent AI consolidation wave, while adding contrast manufacturers to the growing list of previously-unexpected AI startup acquirers (joining imaging center networks, precision medicine analytics companies, and EHR analytics firms). 

However, contrast manufacturers could play a much larger role in imaging AI going forward, considering the high priority that Bayer is placing on its Calantic AI platform.

The Takeaway

Guerbet has been promoting its AI ambitions for several years, and this week’s Intrasense investment suggests that the French contrast giant is ready to transition from developing algorithms to broadly deploying them. That would take a lot more work, but Guerbet’s scale and imaging expertise makes it worth keeping an eye on if you’re in the AI space.

Prioritizing Length of Stay

A new study out of Cedars Sinai provided what might be the strongest evidence yet that imaging AI triage and prioritization tools can shorten inpatient hospitalizations, potentially bolstering AI’s economic and patient care value propositions outside of the radiology department.

The researchers analyzed patient length of stay (LOS) before and after Cedars Sinai adopted Aidoc’s triage AI solutions for intracranial hemorrhage (Nov 2017) and pulmonary embolism (Dec 2018), using 2016-2019 data from all inpatients who received noncontrast head CTs or chest CTAs.

  • ICH Results – Among Cedars Sinai’s 1,718 ICH patients (795 after ICH AI adoption), average LOS dropped by 11.9% from 10.92 to 9.62 days (vs. -5% for other head CT patients).
  • PE Results – Among Cedars Sinai’s 400 patients diagnosed with PE (170 after PE AI adoption), average LOS dropped by a massive 26.3% from 7.91 to 5.83 days (vs. +5.2% for other CCTA patients). 
  • Control Results – Control group patients with hip fractures saw smaller LOS decreases during the respective post-AI periods (-3% & -8.3%), while hospital-wide LOS seemed to trend upward (-2.5% & +10%).

The Takeaway

These results were strong enough for the authors to conclude that Cedars Sinai’s LOS improvements were likely “due to the triage software implementation.” 

Perhaps more importantly, some could also interpret these LOS reductions as evidence that Cedars Sinai’s triage AI adoption also improved its overall patient care and inpatient operating costs, given how these LOS reductions were likely achieved (faster diagnosis & treatment), the typical associations between hospital long stays and negative outcomes, and the fact that inpatient stays have a significant impact on hospital costs.

Prostate MR AI’s Experience Boost

A new European Radiology study showed that Siemens Healthineers’ AI-RAD Companion Prostate MR solution can improve radiologists’ lesion assessment accuracy (especially less-experienced rads), while reducing reading times and lesion grading variability. 

The researchers had four radiologists (two experienced, two inexperienced) assess lesions in 172 prostate MRI exams, with and without AI support, finding that AI-RAD Companion Prostate MR improved:

  • The less-experienced radiologists’ performance, significantly (AUCs: 0.66 to 0.80 & 0.68 to 0.80)
  • The experienced rads’ performance, modestly (AUCs: 0.81 to 0.86 & 0.81 to 0.84)
  • Overall PI-RADS category and Gleason score correlations (r = 0.45 to 0.57)
  • Median reading times (157 to 150 seconds)

The study also highlights Siemens Healthineers’ emergence as an AI research leader, leveraging its relationship / funding advantages over AI-only vendors and its (potentially) greater focus on AI research than its OEM peers to become one of imaging AI’s most-published vendors (here are some of its other recent studies).

The Takeaway

Given the role that experience plays in radiologists’ prostate MRI accuracy, and noting prostate MRI’s historical challenges with variability, this study makes a solid case for AI-RAD Companion Prostate MR’s ability to improve rads’ diagnostic performance (without slowing them down). It’s also a reminder that Siemens Healthineers is serious about supporting its homegrown AI portfolio through academic research.

RevealDx & contextflow’s Lung CT Alliance

RevealDx and contextflow announced a new alliance that should advance the companies’ product and distribution strategies, and appears to highlight an interesting trend towards more comprehensive AI solutions.

The companies will integrate RevealDx’s RevealAI-Lung solution (lung nodule characterization) with contextflow’s SEARCH Lung CT software (lung nodule detection and quantification), creating a uniquely comprehensive lung cancer screening offering. 

contextflow will also become RevealDx’s exclusive distributor in Europe, adding to RevealDx’s global channel that includes a distribution alliance with Volpara (exclusive in Australia/NZ, non-exclusive in US) and a platform integration deal with Sirona

The alliance highlights contextflow’s new partner-driven strategy to expand SEARCH Lung CT beyond its image-based retrieval roots, coming just a few weeks after announcing an integration with Oxipit’s ChestEye Quality AI solution to identify missed lung nodules.

In fact, contextflow’s AI expansion efforts appear to be part of an emerging trend, as AI vendors work to support multiple steps within a given clinical activity (e.g. lung cancer assessments) or spot a wider range of pathologies in a given exam (e.g. CXRs):

  • Volpara has amassed a range of complementary breast cancer screening solutions, and has started to build out a similar suite of lung cancer screening solutions (including RevealDx & Riverain).
  • A growing field of chest X-ray AI vendors (Annalise.ai, Lunit, Qure.ai, Oxipit, Vuno) lead with their ability to detect multiple findings from a single CXR scan and AI workflow. 
  • Siemens Healthineers’ AI-RAD Companion Chest CT solution combines these two approaches, automating multiple diagnostic tasks (analysis, quantification, visualization, results generation) across a range of different chest CT exams and organs.

The Takeaway

contextflow and RevealDx’s European alliance seems to make a lot of sense, allowing contextflow to enhance its lung nodule detection/quantification findings with characterization details, while giving RevealDx the channel and lung nodule detection starting points that it likely needs.

The partnership also appears to represent another step towards more comprehensive and potentially more clinically valuable AI solutions, and away from the narrow applications that have dominated AI portfolios (and AI critiques) before now.

Cathay’s AI Underwriting

Cathay Life Insurance will use Lunit’s INSIGHT CXR AI solution to identify abnormalities in its applicants’ chest X-rays, potentially modernizing a manual underwriting process and uncovering a new non-clinical market for AI vendors.

Lunit INSIGHT CXR will be integrated into Cathay’s underwriting workflow, with the goals of enhancing its radiologists’ accuracy and efficiency, while improving Cathay’s underwriting decisions. 

Lunit and Cathay have reason to be optimistic about this endeavor, given that their initial proof of concept study found that INSIGHT CXR:

  • Improved Cathay’s radiologists’ reading accuracy by 20%
  • Reduced the radiologists’ overall reading time by up to 90%

Those improvements could have a significant labor impact, considering that Cathay’s rads review 30,000 CXRs every year. They might have an even greater business impact, noting the important role that underwriting accuracy has on policy profitability.

Lunit’s part of the announcement largely focused on its expansion beyond clinical settings, revealing plans to “become the driving force of digital innovation in the global insurance market” and to further expand its business into “various sectors outside the hospital setting.”

The Takeaway

Even if life insurers only require CXRs for a small percentage of their applicants (older people, higher value policies), they still review hundreds of thousands of CXRs each year. That makes insurers an intriguing new market segment for AI vendors, and makes you wonder what other non-clinical AI use cases might exist. However, it might also make radiologists who are still skeptical about AI concerned.

AI Experiences & Expectations

The European Society of Radiology just published new insights into how imaging AI is being used across Europe and how the region’s radiologists view this emerging technology.

The Survey – The ESR reached out to 27,700 European radiologists in January 2022 with a survey regarding their experiences and perspectives on imaging AI, receiving responses from just 690 rads.

Early Adopters – 276 the 690 respondents (40%) had clinical experience using imaging AI, with the majority of these AI users:

  • Working at academic and regional hospitals (52% & 37% – only 11% at practices)
  • Leveraging AI for interpretation support, case prioritization, and post-processing (51.5%, 40%, 28.6%)

AI Experiences – The radiologists who do use AI revealed a mix of positive and negative experiences:

  • Most found diagnostic AI’s output reliable (75.7%)
  • Few experienced technical difficulties integrating AI into their workflow (17.8%)
  • The majority found AI prioritization tools to be “very helpful” or “moderately helpful” for reducing staff workload (23.4% & 62.2%)
  • However, far fewer reported that diagnostic AI tools reduced staff workload (22.7% Yes, 69.8% No)

Adoption Barriers – Most coverage of this study will likely focus on the fact that only 92 of the surveyed rads (13.3%) plan to acquire AI in the future, while 363 don’t intend to acquire AI (52.6%). The radiologists who don’t plan to adopt AI (including those who’ve never used AI) based their opinions on:

  • AI’s lack of added value (44.4%)
  • AI not performing as well as advertised (26.4%)
  • AI adding too much work (22.9%)
  • And “no reason” (6.3%)

US Context – These results are in the same ballpark as the ACR’s 2020 US-based survey (33.5% using AI, only 20% of non-users planned to adopt within 5 years), although 2020 feels like a long time ago.

The Takeaway

Even if this ESR survey might leave you asking more questions (What about AI’s impact on patient care? How often is AI actually being used? How do opinions differ between AI users and non-users?), more than anything it confirms what many of us already know… We’re still very early in AI’s evolution, and there’s still plenty of performance and perception barriers that AI has to overcome.

Burdenless Incidental AI

A team of IBM Watson Health researchers developed an interesting image and text-based AI system that could significantly improve incidental lung nodule detection, without being “overly burdensome” for radiologists. That seems like a clinical and workflow win-win for any incidental AI system, and makes this study worth a deeper look.

Watson Health’s R&D-stage AI system automatically detects potential lung nodules in chest and abdominal CTs, and then analyzes the text in corresponding radiology reports to confirm whether they mention lung nodules. In clinical practice, the system would flag exams with potentially missed nodules for radiologist review.

The researchers used the AI system to analyze 32k CTs sourced from three health systems in the US and UK. They then had radiologists review the 415 studies that the AI system flagged for potentially missed pulmonary nodules, finding that it:

  • Caught 100 exams containing at least one missed nodule
  • Flagged 315 exams that didn’t feature nodules (false positives)
  • Achieved a 24% overall positive predictive value
  • Produced just a 1% false positive rate

The AI system’s combined ability to detect missed pulmonology nodules while “minimizing” radiologists’ re-reading labor was enough to make the authors optimistic about this type of AI. They specifically suggested that it could be a valuable addition to Quality Assurance programs, improving patient care while avoiding the healthcare and litigation costs that can come from missed findings.

The Takeaway

Watson Health’s new AI system adds to incidental AI’s growing momentum, joining a number of research and clinical-stage solutions that emerged in the last two years. However, this system’s ability to cross-reference radiology report text and apparent ability to minimize false positives are relatively unique. 

Even if most incidental AI tools aren’t ready for everyday clinical use, and their potential to increase re-read labor might be alarming to some rads, these solutions’ ability to catch earlier stage diseases and minimize the impact of diagnostic “misses” could earn the attention of a wide range of healthcare stakeholders going forward.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!