AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Predicting AI Performance

How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate. 

AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI

But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.

So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions. 

They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:

1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT 

Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding … 

  • AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
  • AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
  • Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario

How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.

The Takeaway

Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.  

Are Doctors Overpaid?

A new study on physician salaries is raising pointed questions about pay for US physicians and whether it contributes to rising healthcare costs – that is, if you believe the numbers are accurate. 

The study was released in July by the National Bureau of Economic Research (NBER), which produces in-depth reports on a variety of topics. 

The current paper is highly technical and may have languished in obscurity were it not for an August 4 article in The Washington Post that examined the findings with the claim that “doctors make more than anyone thought.”

It is indeed true that the NBER’s estimate of physician salaries seems high. The study claims US physicians made an average of $350k in 2017, the year that the researchers focused on by analyzing federal tax records. 

  • The NBER estimate is far higher than $294k in Medscape’s 2017 report on physician compensation – a 19% difference. 

The variation is even greater for diagnostic radiologists. The NBER data claim radiologists had a median annual salary in 2017 of $546k – 38% higher than the $396k average salary listed in Medscape’s 2017 report. 

  • The NBER numbers from six years ago are even higher than 2022/2023 numbers for radiologist salaries in several recent reports, by Medscape ($483k), Doximity ($504k), and Radiology Business ($482k). 

But the NBER researchers claim that by analyzing tax data rather than relying on self-reported earnings, their data are more accurate than previous studies, which they believe underestimate physician salaries by as much as 25%. 

  • They also estimate that physician salaries make up about 9% of total US healthcare costs.

What difference is it how much physicians make? The WaPo story sparked a debate with 6.1k comments so far, with many readers accusing doctors of contributing to runaway healthcare costs in the US.

  • Meanwhile, a thread in the AuntMinnie forums argued whether the NBER numbers were accurate, with some posters warning that the figures could lead to additional cuts in Medicare payments for radiologists. 

The Takeaway

Lost in the debate over the NBER report is its finding that physician pay makes up only 9% of US healthcare costs. In a medical system that’s rife with overutilization, administrative costs, and duplicated effort across fragmented healthcare networks, physician salaries should be the last target for those who actually want to cut healthcare spending. 

Grading AI Report Quality

One of the most exciting new use cases for medical AI is in generating radiology reports. But how can you tell whether the quality of a report generated by an AI algorithm is comparable to that of a radiologist?

In a new study in Patterns, researchers propose a technical framework for automatically grading the output of AI-generated radiology reports, with the ultimate goal of producing AI-generated reports that are indistinguishable from those of radiologists. 

Most radiology AI applications so far have focused on developing algorithms to identify individual pathologies on imaging exams. 

  • While this is useful, helping radiologists streamline the production of their main output – the radiology report – could have a far greater impact on their productivity and efficiency. 

But existing tools for measuring the quality of AI-generated narrative reports are limited and don’t match up well with radiologists’ evaluations. 

  • To improve that situation, the researchers applied several existing automated metrics for analyzing report quality and compared them to the scores of radiologists, seeking to better understand AI’s weaknesses. 

Not surprisingly, the automated metrics fell short in several ways, including false prediction of findings, omitting findings, and incorrectly locating and predicting the severity of findings. 

  • These shortcomings point out the need for better scoring systems for gauging AI performance. 

The researchers therefore proposed a new metric for grading AI-generated report quality, called RadGraph F1, and a new methodology, RadCliQ, to predict how well an AI report would measure up to radiologist scrutiny. 

  • RadGraph F1 and RadCliQ could be used in future research on AI-generated radiology reports, and to that end the researchers have made the code for both metrics available as open source.

Ultimately, the researchers see the construction of generalist medical AI models that could perform multiple complex tasks, such as conversing with radiologists and physicians about medical images. 

  • Another use case could be applications that are able to explain imaging findings to patients in everyday language. 

The Takeaway

It’s a complex and detailed paper, but the new study is important because it outlines the metrics that can be used to teach machines how to generate better radiology reports. Given the imperative to improve radiologist productivity in the face of rising imaging volume and workforce shortages, this could be one more step on the quest for the Holy Grail of AI in radiology.

How COVID Crashed CT Scanners in China

In the early days of the COVID-19 pandemic in China, hospitals were performing so many lung scans of infected patients that CT scanners were crashing. That’s according to an article based on an interview with a Wuhan radiologist that provides a chilling first-hand account of radiology’s role in what’s become the biggest public health crisis of the 21st century.

The interview was originally published in 2022 by the Chinese-language investigative website Caixin and was translated and published this month by U.S. Right to Know, a public health advocacy organization. 

In a sign of the information’s sensitivity, the original publication on Caixin’s website has been deleted, but U.S. Right to Know obtained the document from the US State Department under the Freedom of Information Act. 

Radiologists at a Wuhan hospital noticed how COVID cases began doubling every 3-4 days in early January 2020, the article states, with many patients showing signs of ground-glass opacities on CT lung scans – a telltale sign of COVID infection. But Chinese authorities suppressed news about the rapid spread of the virus, and by January 11 the official estimate was that there were only 41 COVID cases in the entire country.

In reality, COVID cases were growing rapidly. CT machines began crashing in the fourth week of January due to overheating, said the radiologist, who estimated the number of cases in Wuhan at 10,000 by January 21. Hospitals were forced to turn infected patients away, and many people were so sick they were unable to climb onto X-ray tables for exams. Other details included: 

  • Chinese regulatory authorities denied that human-to-human transmission of the SARS CoV-2 virus was occurring even as healthcare workers began falling ill
  • Many workers at Chinese hospitals were discouraged from wearing masks in the pandemic’s early days to maintain the charade that human-to-human contact was not possible – and many ended up contracting the virus
  • Radiologists and other physicians lived in fear of retaliation if they spoke up about the virus’ rapid spread

The Takeaway

The article provides a stunning behind-the-scenes look at the early days of a pandemic that would go on to reshape the world in 2020. What’s more, it demonstrates the vital role of radiology as a front-line service that’s key to the early identification and treatment of disease – even in the face of bureaucratic barriers to delivering quality care.

Undermining the Argument for NPPs

If you think you’ve been seeing more non-physician practitioners (NPPs) reading medical imaging exams, you’re not alone. A new study in Current Problems in Diagnostic Radiology found that the rate of NPP interpretations went up almost 27% over four years. 

US radiologists have zealously guarded their position as the primary readers of imaging exams, even as allied health professionals like nurses and physician assistants clamor to extend their scope of practice (SOP) into image interpretation. The struggle often plays out in state legislatures, with each side pushing laws benefiting their positions.

How has this dynamic affected NPP interpretation rates? In the current study, researchers looked at NPP interpretations of 110 million imaging claims from 2016 to 2020. They also examined how NPP rates changed by geographic location, and whether state laws on NPP practice authority affected rates. Findings included:

  • The rate of NPP interpretation for imaging studies went from 2.6% to 3.3% in the study period – growth of 26.9%.
  • Metropolitan areas saw the highest growth rate in NPP interpretation, with growth of 31.3%, compared to micropolitan areas (18.8%), while rates in rural areas did not grow at a statistically significant rate.
  • Rates of NPP interpretation tended to grow more in states with less restrictive versus more restrictive practice-authority laws (45% vs. 16.6%).
  • NPP interpretation was focused on radiography/fluoroscopy (53%), ultrasound (24%), and CT and MRI (21%). 

The findings are particularly interesting because they run counter to one of the main arguments made by NPPs for expanding their scope of practice into imaging: to alleviate workforce shortages in rural areas. Instead, NPPs (like physicians themselves) tend to gravitate to urban areas – where their services may not be as needed. 

The study also raises questions about whether the training that NPPs receive is adequate for a highly subspecialized area like medical imaging, particularly given the study’s findings that advanced imaging like CT and MRI make up one in five exams being read by NPPs. 

The Takeaway    

The findings undermine one of the main arguments in favor of using non-physician practitioners – to address access-to-care issues. The question is whether the study has an impact on the ongoing turf battle between radiologists and NPPs over image interpretation playing out in state legislatures. 

The Perils of Worklist Cherry-Picking

If you’re a radiologist, chances are at some point in your career you’ve cherry-picked the worklist. But picking easy, high-RVU imaging studies to read before your colleagues isn’t just rude – it’s bad for patients and bad for healthcare.

That’s according to a new study in Journal of Operations Management that analyzes radiology cherry-picking in the context of operational workflow and efficiency. 

Based on previous research, researchers hypothesized that radiologists who are free to pick from an open worklist would choose the easier studies with the highest compensation – the classic definition of cherry-picking.

To test their theory, they analyzed a dataset of 2.2M studies acquired at 62 hospitals from 2014 to 2017 that were read by 115 different radiologists. They developed a statistical metric called “bang for the buck,” or BFB, to classify the value of an imaging study in terms of interpretation time relative to RVU level. 

They then assessed the impact of BFB on turnaround time (TAT) for different types of imaging exams based on priority, classified as Stat, Expedited, and Routine. Findings included:

  • High-priority Stat studies were reported quickly regardless of BFB, indicating little cherry-picking impact
  • For Routine studies, those with higher BFB had much lower reductions in turnaround — a sign of cherry-picking
  • Adding one high-BFB Routine study to a radiologist’s worklist resulted in a much longer increase in TAT for Expedited exams compared to low-BFB studies (increase of 17.7 minutes vs. 2 minutes)
  • The above delays could result in longer patient lengths of stay that translate to $2.1M-$4.2M in extra costs across the 62 hospitals in the study. 

The findings suggest that radiologists in the study prioritized high-BFB Routine studies over Expedited exams – undermining the exam prioritization system and impacting care for priority cases.

Fortunately, the researchers offer suggestions for countering the cherry-picking effect, such as through intelligent scheduling or even hiding certain studies – like high-BFB Routine exams – from radiologists when there are Expedited studies that need to be read. 

The Takeaway 

The study concludes that radiology’s standard workflow of an open worklist that any radiologist can access can become an “imbalanced compensation scheme” that can lead to poorer service for high-priority tasks. On the positive side, the solutions proposed by the researchers seem tailor-made for IT-based interventions, especially ones that are rooted in AI. 

Salary Data Reveal Medicine’s Golden Cage

Are you a glass-half-full or a glass-half-empty kind of person? Either way, there’s lots to unpack in the latest data on physician salaries, this time from Medscape

Medscape’s survey of over 10k US physicians across over 29 medical specialties found that overall physician salaries have grown 18% over the last five years, to $352k, while specialists made an average of $382k. 

As with last year, radiologists landed in the top 10 of highest-compensated specialists, a finding that’s in line with previous salary surveys, such as from Doximity. Medscape found that radiologists had an average annual salary of $483k in 2023, compared to $437k in 2022. Radiologists had an average annual salary of $504k in the Doximity data. 

Other nuggets from the Medscape survey:

  • “Stagnant” reimbursement relative to rising practice costs has cut into physician income. 
  • The gender gap is narrowing. Male primary care doctors in 2023 earn 19% more than females, compared to about 25% previously.
  • Male specialist physicians earn 27% more than females, down from 31% last year and 33% the year before that.
  • Only 19% of radiologists are women – one of the lowest rates of female participation among medical specialties. 
  • 58% of radiologists feel they are fairly paid.
  • Radiologists report working an average of 49.6 hours a week.
  • 90% of radiologists say they would choose their specialty again, ranking #10.

The Takeaway

On the positive side, physician salaries continue to rise, and medicine is making encouraging progress in narrowing the gender gap. Radiologists seem to be well-compensated and relatively happy, but the specialty has more to do to attract women.

Underlying the raw data is a disturbing undercurrent of physician dissatisfaction, with many feeling as though medicine is a golden cage. In the free-response portion of the survey, doctors described themselves as caught between falling reimbursement and rising costs, with overwork also leading to burnout

The Medscape survey shows that addressing physician burnout must become a priority for the US healthcare system, and it can’t be solved merely by boosting salaries. Increasing the number of residency slots is a good first step (see below).

Radiology Bucks Doctor Salary Decline

The latest news on physician salaries is out, and it’s not pretty. A new Doximity survey found that average physician pay declined 2.4% last year, compared to an increase of 3.8% in 2021. The drop was exacerbated by high inflation rates that took a bite out of physician salaries. 

The Doximity report paints a picture of physicians beset by rising burnout, shortages, and a persistent gender pay gap. Doctors across multiple specialties report feeling more stressed even as wage growth has stalled.

To compile the 2022 data, Doximity got responses from 31,000 US physicians. There was a wide range of average annual compensation across medical specialties, with radiology landing at number 10 on the top 20 list, while nuclear medicine occupied the 20th spot:

  • Radiation oncology: $547k vs. $544k in 2021
  • Radiology: $504k vs. $495k 
  • Nuclear medicine: $392k vs. $399k

In other findings of the report:

  • Male physicians made $110,000 more than women doctors. At a gap of 26%, this is actually an improvement compared to 28% in 2021.
  • Men physicians over their career make over $2 million more than women.
  • Nuclear medicine had the smallest pay gap ($394k vs. $382k)
  • The pay gap could contribute to higher burnout rates, with 92% of women reporting overwork compared to 83% of men. 
  • Two-thirds of physicians are considering an employment change due to overwork. 

Ironically, Doximity cited results of a recent survey in which 71% of physicians said they would accept lower compensation for better work-life balance. 

The Takeaway

The news about salaries could be a gut punch to many physicians, who are already dealing with epidemic levels of burnout. Radiology salaries bucked the trend by rising 1.6%, which could explain its popularity among medical students over the last three years. 

The question remains, is the money worth it? Rising imaging volumes have been tied to burnout in radiology, and the Doximity report indicates that some physicians are willing to forgo money for better quality of life.

Moral Distress in Radiology

The rising volume of medical imaging studies isn’t just a data point. It’s causing moral distress among radiologists and is a major systemic cause of the specialty’s burnout epidemic. 

Radiology’s problem with burnout is no secret, with a recent analysis disclosing that 54% of all radiologists identify as burned out. Studies have found that a cause of burnout can be moral distress, defined within healthcare as when a clinician knows the right course of action for a patient, but is prevented from taking it due to systemic factors.

In a March 22 study in American Journal of Roentgenology, researchers describe findings from a survey of 93 radiologists on their feelings of moral distress in different clinical scenarios and the impact it had on their careers. In short:

  • 98% reported some degree of moral distress
  • 48% thought the COVID-19 pandemic influenced their moral distress
  • 28% considered leaving their jobs
  • 18% actually did leave a job

Several factors contribute to moral distress in radiology: 

  • Case volumes that are higher than can be read safely
  • Higher case volumes that prevent resident teaching
  • A lack of action and support among administration

These latter issues lead to burnout in specific ways, the authors wrote. Institutional constraints to providing high-quality care can prompt physicians to spend more time at work. Error rates can also grow during shifts with high study volumes or that last longer than 10 hours. And orders for unnecessary imaging exams can be seen as disregard for professional expertise. 

The Takeaway

This study rips the Band-Aid off the burnout problem in radiology, pointing out that inexorably rising imaging volumes rather than bad bosses or lazy colleagues are a root cause, one that’s been exacerbated by the COVID-19 pandemic.  

A further implication is that no amount of “self-care” – often prescribed as a solution for burnout – will cure the problem in the long run as long as radiologists will have ever-growing worklists to return to after their sabbaticals and motivational staff meetings. The researchers recommended “urgent action” to address the issue.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!