Autonomous AI for Medical Imaging is Here. Should We Embrace It?

What is autonomous artificial intelligence, and is radiology ready for this new technology? In this paper, we explore one of the most exciting autonomous AI applications, ChestLink from Oxipit. 

What is Autonomous AI? 

Up to now, most interpretive AI solutions have focused on assisting radiologists with analyzing medical images. In this scenario, AI provides suggestions to radiologists and alerts them to suspicious areas, but the final diagnosis is the physician’s responsibility.

Autonomous AI flips the script by having AI run independently of the radiologist, such as by analyzing a large batch of chest X-ray exams for tuberculosis to screen out those certain to be normal. This can significantly reduce the primary care workload, where healthcare providers who offer preventive health checkups may see up to 80% of chest X-rays with no abnormalities. 

Autonomous AI frees the radiologist to focus on cases with suspicious pathology – with the potential of delivering a more accurate diagnosis to patients in real need.

One of the first of this new breed of autonomous AI is ChestLink from Oxipit. The solution received the CE Mark in March 2022, and more than a year later it is still the only AI application capable of autonomous performance. 

How ChestLink Works

ChestLink produces final chest X-ray reports on healthy patients with no involvement from human radiologists. The application only reports autonomously on chest X-ray studies where it is highly confident that the image does not include abnormalities. These studies are automatically removed from the reporting workflow. 

ChestLink enables radiologists to report on studies most likely to have abnormalities. In current clinical deployments, ChestLink automates 10-30% of all chest X-ray workflow. The exact percentage depends on the type of medical institution, with primary care facilities having the most potential for automation.

ChestLink Clinical Validation

ChestLink was trained on a dataset with over 500k images. In clinical validation studies, ChestLink consistently performed at 99%+ sensitivity.

A recent study published in Radiology highlighted the sensitivity of the application.

“The most surprising finding was just how sensitive this AI tool was for all kinds of chest disease. In fact, we could not find a single chest X-ray in our database where the algorithm made a major mistake. Furthermore, the AI tool had a sensitivity overall better than the clinical board-certified radiologists,” said study co-author Louis Lind Plesner, MD, from the Department of Radiology at the Herlev and Gentofte Hospital in Copenhagen, Denmark.

In this study ChestLink autonomously reported on 28% of all normal studies.

In another study at the Oulu University Hospital in Finland, researchers concluded that AI could reliably remove 36.4% of normal chest X-rays from the reporting workflow with a minimal number of false negatives, leading to effectively no compromise on patient safety. 

Safe Path to AI Autonomy

Oxipit ChestLink is currently used in healthcare facilities in the Netherlands, Finland, Lithuania, and other European countries, and is in the trial phase for deployment in one of the leading hospitals in England.

ChestLink follows a three-stage framework for clinical deployment.

  • Retrospective analysis. ChestLink analyzes a couple of years worth (100k+) of historic chest x-ray studies at the medical institution. In this analysis the product is validated on real-world data. It also realistically estimates what fraction of reporting scope can be automated.
  • Semi-autonomous operations. The application moves into prospective settings, analyzing images in near-real time. ChestLink produces preliminary reports for healthy patients, which may then be approved by a certified clinician.
  • Autonomous operations. The application autonomously reports on high-confidence healthy patient studies. The application performance is monitored in real-time with analytical tools.

Are We There Yet?

ChestLink aims to address the shortage of clinical radiologists worldwide, which has led to a substantial decline in care quality.

In the UK, the NHS currently faces a massive 33% shortfall in its radiology workforce. Nearly 71% of clinical directors of UK radiology departments feel that they do not have a sufficient number of radiologists to deliver safe and effective patient care.

ChestLink offers a safe pathway into autonomous operations by automating a significant and somewhat mundane portion of radiologist workflow without any negative effects for patient care. 

So should we embrace autonomous AI? The real question should be, can we afford not to? 

Making Screening Better

While population-based cancer screening has demonstrated its value, there’s no question that screening could use improvement. Two new studies this week show how to improve on one of screening’s biggest challenges: getting patients to attend their follow-up exams.

In the first study in JACR, researchers from the University of Rochester wanted to see if notifying people about actionable findings shortly after screening exams had an impact on follow-up rates. Patients were notified within one to three weeks after the radiology report was completed. 

They also examined different methods for patient communication, including snail-mail letters, notifications from Epic’s MyChart electronic patient portal, and phone calls. In approximately 2.5k patients within one month of due date, they found that follow-up adherence rates varied for each outreach method as follows:

  • Phone calls – 60%
  • Letters – 57%
  • Controls – 53%
  • MyChart notifications – 36%

(The researchers noted that the COVID-19 pandemic may have disproportionately affected those in the MyChart group.) 

Fortunately, the university uses natural language processing-based software called Backstop to make sure no follow-up recommendations fall through the cracks. 

  • Backstop includes Nuance’s mPower technology to identify actionable findings from unstructured radiology reports; it triggers notifications to both primary care providers and patients about the need to complete follow-up.

Once the full round of Backstop notifications had taken place, compliance rates rose and there was no statistically significant difference between how patients got the early notification: letter (89%), phone (91%), MyChart (90%), and control (88%). 

In the second study, researchers in JAMA described how they used automated algorithms to analyze EHR data from 12k patients to identify those eligible for follow-up for cancer screening exams.

  • They then tested three levels of intervention to get people to their exams, ranging from EHR reminders to outreach to patient navigation to all three. 

Patients who got EHR reminders, outreach, and navigation or EHR reminders and outreach had the highest follow-up completion rates at 120 days compared to usual care (31% for both vs. 23%). Rates were similar to usual care for those who only got EHR reminders (23%).

The Takeaway

This week’s studies indicate that while health technology is great, it’s how you use it that matters. While IT tools can identify the people who need follow-up, it’s up to healthcare personnel to make sure patients get the care they need.

AI Tug of War Continues

The ongoing tug of war over AI’s value to radiology continues. This time the rope has moved in AI’s favor with publication of a new study in JAMA Network Open that shows the potential of a new type of AI language model for creating radiology reports.

  • Headlines about AI have ping-ponged in recent weeks, from positive studies like MASAI and PERFORMS to more equivocal trials like a chest X-ray study in Radiology and news from the UK that healthcare authorities may not be ready for chest X-ray AI’s full clinical roll-out. 

In the new paper, Northwestern University researchers tested a chest X-ray AI algorithm they developed with a transformer technique, a type of generative AI language model that can both analyze images and generate radiology text as output. 

  • Transformer language models show promise due to their ability to combine both image and non-image data, as researchers showed in a paper last week.

The Northwestern researchers tested their transformer model in 500 chest radiographs of patients evaluated overnight in the emergency department from January 2022 to January 2023. 

Reports generated by AI were then compared to reports from a teleradiologist as well as the final report by an in-house radiologist, which was set as the gold standard. The researchers found that AI-generated reports …

  • Had sensitivity a bit lower than teleradiology reports (85% vs. 92%)
  • Had specificity a bit higher (99% vs. 97%)
  • In some cases improved on the in-house radiology report by detecting subtle abnormalities missed by the radiologist

Generative AI language models like the Northwestern algorithm could perform better than algorithms that rely on a classification approach to predicting the presence of pathology. Such models limit medical diagnoses to yes/no predictions that may omit context that’s relevant to clinical care, the researchers believe. 

In real-world clinical use, the Northwestern team thinks their model could assist emergency physicians in circumstances where in-house radiologists or teleradiologists aren’t immediately available, helping triage emergent cases.

The Takeaway

After the negative headlines of the last few weeks, it’s good to see positive news about AI again. Although the current study is relatively small and much larger trials are needed, the Northwestern research has promising implications for the future of transformer-based AI language models in radiology.

CT Lung Screening Saves Women

October may be Breast Cancer Awareness Month, but a new study has great news for women when it comes to another life-threatening disease: lung cancer. 

Italian researchers in Lung Cancer found that CT lung cancer screening delivered survival benefits that were particularly dramatic for women – and could address cardiovascular disease as well. 

  • They found that in addition to much higher survival rates, women who got CT lung screening after 12 years of follow-up had lower all-cause mortality than men. 

Of all the cancer screening tests, lung screening is the new kid on the block.

  • Although randomized clinical trials have shown it to deliver lung cancer mortality benefits of 20% and higher, uptake of lung screening has been relatively slow compared to other tests.

In the current study, researchers from the Fondazione IRCCS Istituto Nazionale dei Tumori in Milan analyzed data from 6.5k heavy smokers in the MILD and BioMILD trials who got low-dose CT screening from 2005 to 2016. 

In addition to cancer incidence and mortality, they also used Coreline Soft’s AVIEW software to calculate coronary artery calcium (CAC) scores acquired with the screening exams to see if they predicted lung cancer mortality. Researchers found that after 12 years of follow-up …

  • There was no statistically significant difference in lung cancer incidence between women and men (4.4% vs. 4.7%)
  • But women had lower lung cancer mortality than men (1% vs. 1.9%) as well as lower all-cause mortality (4.1% vs. 7.7%), both statistically significant
  • Women had higher lung cancer survival than men (72% vs. 52%)
  • 15% of participants had CAC scores between 101-400, and all-cause mortality increased with higher scores
  • Women had lower CAC scores, which could play a role in lower all-cause mortality due to less cardiovascular disease

The Takeaway

This is a fascinating study on several levels. First, it shows that lung cancer screening produces a statistically significant decline in all-cause mortality for women compared to men.

Second, it shows that CT lung cancer screening can also serve as a screening test for cardiovascular disease, helping direct those with high CAC scores to treatment such as statin therapy. This type of opportunistic screening could change the cost-benefit dynamic when it comes to analyzing lung screening’s value – especially for women.

More Work Ahead for Chest X-Ray AI?

In another blow to radiology AI, the UK’s national technology assessment agency issued an equivocal report on AI for chest X-ray, stating that more research is needed before the technology can enter routine clinical use.

The report came from the National Institute for Health and Care Excellence (NICE), which assesses new health technologies that have the potential to address unmet NHS needs. 

The NHS sees AI as a potential solution to its challenge of meeting rising demand for imaging services, a dynamic that’s leading to long wait times for exams

But at least some corners of the UK health establishment have concerns about whether AI for chest X-ray is ready for prime time. 

  • The NICE report states that – despite the unmet need for quicker chest X-ray reporting – there is insufficient evidence to support the technology, and as such it’s not possible to assess its clinical and cost benefits. And it said there is “no evidence” on the accuracy of AI-assisted clinician review compared to clinicians working alone.

As such, the use of AI for chest X-ray in the NHS should be limited to research, with the following additional recommendations …

  • Centers already using AI software to review chest X-rays may continue to do so, but only as part of an evaluation framework and alongside clinician review
  • Purchase of chest X-ray AI software should be made through corporate, research, or non-core NHS funding
  • More research is needed on AI’s impact on a number of outcomes, such as CT referrals, healthcare costs and resource use, review and reporting time, and diagnostic accuracy when used alongside clinician review

The NICE report listed 14 commercially available chest X-ray algorithms that need more research, and it recommended prospective studies to address gaps in evidence. AI developers will be responsible for performing these studies.

The Takeaway

Taken with last week’s disappointing news on AI for radiology, the NICE report is a wakeup call for what had been one of the most promising clinical use cases for AI. The NHS had been seen as a leader in spearheading clinical adoption of AI; for chest X-ray, clinicians in the UK may have to wait just a bit longer.

AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Predicting AI Performance

How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate. 

AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI

But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.

So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions. 

They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:

1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT 

Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding … 

  • AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
  • AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
  • Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario

How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.

The Takeaway

Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.  

Can AI Direct Breast MRI?

A deep learning algorithm trained to analyze mammography images did a better job than traditional risk models in predicting breast cancer risk. The study shows the AI model could direct the use of supplemental screening breast MRI for women who need it most. 

Breast MRI has emerged (along with ultrasound) as one of the most effective imaging modalities to supplement conventional X-ray-based mammography. Breast MRI performs well regardless of breast tissue density, and can even be used for screening younger high-risk women for whom radiation is a concern. 

But there are also disadvantages to breast MRI. It’s expensive and time-consuming, and clinicians aren’t always sure which women should get it. As a result, breast MRI is used too often in women at average risk and not often enough in those at high risk. 

In the current study in Radiology, researchers from MGH compared the Mirai deep learning algorithm to conventional risk-prediction models. Mirai was developed at MIT to predict five-year breast cancer risk, and the first papers on the model emerged in 2019; previous studies have already demonstrated the algorithm’s prowess for risk prediction

Mirai was used to analyze mammograms and develop risk scores for 2.2k women who also received 4.2k screening breast MRI exams from 2017-2020 at four facilities. Researchers then compared the performance of the algorithm to traditional risk tools like Tyrer-Cuzick and NCI’s Breast Cancer Risk Assessment (BCRAT), finding that … 

  • In women Mirai identified as high risk, the cancer detection rate per 1k on breast MRI was far higher compared to those classified as high risk by Tyrer-Cuzick and BCRAT (20.6 vs. 6.0 & 6.8)
  • Mirai had a higher PPV for predicting abnormal findings on breast MRI screening (14.6% vs. 5.0% & 5.5%)
  • Mirai scored higher in PPV of biopsies recommended (32.4% vs. 12.7% & 11.1%) and PPV for biopsies performed (36.4% vs. 13.5% & 12.5%)

The Takeaway
Breast imaging has become one of the AI use cases with the most potential, based on recent studies like PERFORMS and MASAI, and the new study shows Mirai could be useful in directing women to breast MRI screening. Like the previous studies, the current research is pointing to a near-term future in which AI and deep learning can make breast screening more accurate and cost-effective than it’s ever been before. 

POCUS Cuts DVT Stays

Using POCUS in the emergency department (ED) to scan patients with suspected deep vein thrombosis (DVT) cut their length of stay in the ED in half. 

Reducing hospital length of stay is one of the holy grails of healthcare quality improvement. 

  • It’s not only more expensive to keep patients in the hospital longer, but it can expose them to morbidities like hospital-acquired infections.

Patients admitted with suspected DVT often receive ultrasound scans performed by radiologists or sonographers to determine whether the blood clot is at risk of breaking off – a possibly fatal result. 

  • But this requires a referral to the radiology department. What if emergency physicians performed the scans themselves with POCUS?

To answer this question, researchers at this week’s European Emergency Medicine Conference presented results from a study of 93 patients at two hospitals in Finland.

  • From October 2017 to October 2019, patients presenting at the ED received POCUS scans from emergency doctors trained on the devices. 

Results were compared to 135 control patients who got usual care and were sent directly to radiology departments for ultrasound. 

  • Researchers found that POCUS reduced ED length of stay from 4.5 hours to 2.3 hours, a drop of 52%.

Researchers described the findings as “convincing,” especially as they occurred at two different facilities. The results also answer a recent study that found POCUS only affected length of stay when performed on the night shift. 

The Takeaway
Radiology might not be so happy to see patient referrals diverted from their department, but the results are yet another feather in the cap for POCUS, which continues to show that – when in the right hands – it can have a big impact on healthcare quality.

Radiology’s Enduring Popularity

Radiology is seeing a resurgence of interest from medical students picking the specialty in the National Resident Matching Program (NRMP). While radiology’s popularity is at historically high levels, the new analysis shows how vulnerable the field is to macro-economic trends in healthcare. 

Radiology’s popularity has always ebbed and flowed. In general the field is seen as one of the more attractive medical specialties due to the perception that it combines high salaries with lifestyle advantages. But there have been times when medical students shunned radiology.

The new paper offers insights into these trends. Published in Radiology by Francis Deng, MD, and Linda Moy, MD, the paper fleshes out an earlier analysis that Deng posted as a Twitter thread after the 2023 Match, showing that diagnostic radiology saw the highest growth in applicants to medical specialties over a three-year period.

Deng and Moy analyze trends in the Match over almost 25 years in the new study, finding…

  • The 2023 Match in radiology was the most competitive since 2001 based on percentage of applicants matching (81.1% vs. 73.3%)
  • 5.9% of seniors in US MD training programs applied to diagnostic radiology in the 2023 Match, the highest level since 2010
  • Fewer radiology residency slots per applicant were available in 2023 compared to the historical average (0.67 vs. 0.81) 

Interest in radiology hit its lowest levels in 1996 and 2015, when the number of applicants fell short of available radiology residency positions in the Match. It’s perhaps no surprise that these lows followed two major seismic healthcare shifts that could have negatively affected job prospects for radiologists: the “Hillarycare” healthcare reform effort in the early 1990s and the emergence of AI for healthcare in the mid-2010s. 

Hillarycare never happened, and Deng and Moy noted that outreach efforts to medical students about AI helped reverse the perspective that the technology would be taking radiologists’ jobs. Another advantage for radiology is its early adoption of teleradiology, which enables remote work and more flexible work options – a major lifestyle perk. 

The Takeaway

The new paper provides fascinating insights that support why radiology remains one of medicine’s most attractive specialties. Radiology’s appeal could even grow, given recent studies showing that work-life balance is a major priority for today’s medical students.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!