Simpler Radiology Reports from LLMs

Can large language model AI algorithms write simpler radiology reports for patients than clinicians? A study published in European Radiology found that LLM-produced reports were more readable, but there are areas of concern that will require fine-tuning.

Patients are taking greater interest in managing their own healthcare, requesting direct access to medical information like images and reports.

  • That’s a good thing, but it creates challenges for healthcare professionals more used to communicating with other providers.

Taking the time to draft a report just for patients is a non-starter for many radiology professionals in a time of workforce shortages.

  • But this could be an excellent use case for AI, especially the LLMs that have sprung up over the past few years. 

So researchers from Germany tested three LLMs to draft patient-friendly versions of 60 radiology reports from X-ray, CT, MRI, and ultrasound modalities. 

  • The LLMs included the ubiquitous ChatGPT-4o, as well as two open-source LLMs (Llama-3-70B and Mixtral-8x22B) that had been deployed on-premises within their hospitals.

The authors wanted to know not only how well the LLMs performed in drafting patient reports, but also whether there were differences between the black-box ChatGPT 4o and the two open-source LLMs.

  • The LLMs were instructed to generate layperson summaries at the eighth-grade reading level, preserving key clinical information. 

In comparing original radiology reports to LLM-produced summaries, researchers found…

  • Original reports had much lower ease-of-reading scores on the Flesch readability scale (17 vs. 44-46).
  • Original reports were judged much less understandable on a five-point scale (1.5 vs. 4.1-4.4). 
  • The two open-source LLMs had higher rates of critical errors that could lead to patient harm (8.3%-10%), while ChatGPT 4o had no critical errors. 
  • Original reports had shorter total reading time versus LLM versions (15 vs. 64-73 seconds).
  • There was no difference in understandability based on modality.

The findings on critical errors are particularly concerning. 

  • Clinicians may see on-premises open-source LLMs as having patient privacy advantages over cloud-based ChatGPT 4o, but such models may require more clinical oversight to avoid patient harm. 

The Takeaway

The new study on LLM-generated patient radiology summaries is encouraging, pointing to a future in which a cumbersome task could be offloaded to generative AI algorithms. But much work remains to ensure patient safety and privacy before this can happen.

Get every issue of The Imaging Wire, delivered right to your inbox.