Simpler Radiology Reports from LLMs

Can large language model AI algorithms write simpler radiology reports for patients than clinicians? A study published in European Radiology found that LLM-produced reports were more readable, but there are areas of concern that will require fine-tuning.

Patients are taking greater interest in managing their own healthcare, requesting direct access to medical information like images and reports.

  • That’s a good thing, but it creates challenges for healthcare professionals more used to communicating with other providers.

Taking the time to draft a report just for patients is a non-starter for many radiology professionals in a time of workforce shortages.

  • But this could be an excellent use case for AI, especially the LLMs that have sprung up over the past few years. 

So researchers from Germany tested three LLMs to draft patient-friendly versions of 60 radiology reports from X-ray, CT, MRI, and ultrasound modalities. 

  • The LLMs included the ubiquitous ChatGPT-4o, as well as two open-source LLMs (Llama-3-70B and Mixtral-8x22B) that had been deployed on-premises within their hospitals.

The authors wanted to know not only how well the LLMs performed in drafting patient reports, but also whether there were differences between the black-box ChatGPT 4o and the two open-source LLMs.

  • The LLMs were instructed to generate layperson summaries at the eighth-grade reading level, preserving key clinical information. 

In comparing original radiology reports to LLM-produced summaries, researchers found…

  • Original reports had much lower ease-of-reading scores on the Flesch readability scale (17 vs. 44-46).
  • Original reports were judged much less understandable on a five-point scale (1.5 vs. 4.1-4.4). 
  • The two open-source LLMs had higher rates of critical errors that could lead to patient harm (8.3%-10%), while ChatGPT 4o had no critical errors. 
  • Original reports had shorter total reading time versus LLM versions (15 vs. 64-73 seconds).
  • There was no difference in understandability based on modality.

The findings on critical errors are particularly concerning. 

  • Clinicians may see on-premises open-source LLMs as having patient privacy advantages over cloud-based ChatGPT 4o, but such models may require more clinical oversight to avoid patient harm. 

The Takeaway

The new study on LLM-generated patient radiology summaries is encouraging, pointing to a future in which a cumbersome task could be offloaded to generative AI algorithms. But much work remains to ensure patient safety and privacy before this can happen.

RadGPT Simplifies Radiology Reports for Patients

When it comes to informing patients of their imaging results, radiologists are caught between a rock and hard place. A new study in JACR shows how generative AI can help by drafting patient-friendly reports that are simple but accurate.

Patients must be informed immediately of their medical results according to a 2021 final rule under the 21st Century Cures Act that prevents medical information blocking. 

  • And while the technology exists to do that through tools like email and electronic patient portals, rapid notification can create confusion because the language physicians use to communicate with each other isn’t easily understood by anyone outside medicine.

Sure, radiology reports could be rewritten manually for patients, who typically read at about the eighth-grade level.

  • But given today’s workforce shortages, who’s going to do that?

Generative AI and large language models offer a solution. In the new JACR paper, researchers from Stanford University led by senior author Curtis Langlotz, MD, PhD, described their development of RadGPT, an LLM designed to improve patient communication.

  • To develop RadGPT, researchers started with OpenAI’s GPT-4 model and the RadGraph concept extraction tool to create an LLM that analyzes patient radiology reports and generates concept explanations and question-and-answer pairs.

How well did RadGPT work? The researchers tested it on 30 radiology reports generated at Stanford from 2012 to 2020, including different modalities and clinical applications. 

  • The LLM was asked to generate reports at a fifth-grade reading level (the level recommended by the Joint Commission for patient-facing healthcare materials).

Five radiology-trained physicians then rated the quality of RadGPT’s responses, finding …

  • The average rating of RadGPT-generated concept explanations was 4.8 out of 5.
  • 95% of concept explanations had an average rating of 4 or higher.
  • 50% of concept explanations were rated 5, the highest possible rating.
  • Questions and answers generated by RadGPT were also rated highly, with an average rating of 3.0 on a three-point scale..

The Stanford researchers told The Imaging Wire that their goal is to make RadGPT more widely available as part of a prospective evaluation with real patients.

  • They are also developing a user-friendly interface in which patients can receive hyperlinked radiology reports.

The Takeaway

RadGPT and solutions like it fill a desperate need for tools that can save time for radiologists while helping patients better understand their reports and get more engaged in their care. The next step is to get technology like this into the hands of practicing radiologists.

Get every issue of The Imaging Wire, delivered right to your inbox.