How Should AI Be Monitored?

Once an AI algorithm has been approved and moves into clinical use, how should its performance be monitored? This question was top of mind at last week’s meeting of the FDA’s new Digital Health Advisory Committee.

AI has the potential to radically reshape healthcare and help clinicians manage more patients with fewer staff and other resources. 

  • But AI also represents a regulatory challenge because it’s constantly learning, such that after a few years an AI algorithm might be operating much differently from the version first approved by the FDA – especially with generative AI. 

This conundrum was a point of discussion at last week’s DHAC meeting, which was called specifically to focus on regulation of generative AI, and could result in new rules covering all AI algorithms. (An executive summary that outlines the FDA’s thinking is available for download.)

Radiology was well-represented at DHAC, understandable given it has the lion’s share of authorized algorithms (73% of 950 devices at last count). 

  • A half-dozen radiology AI experts gave presentations over two days, including Parminder Bhatia of GE HealthCare; Nina Kottler, MD, of Radiology Partners; Pranav Rajpurkar, PhD, of Harvard; and Keith Dreyer, DO, PhD, and Bernardo Bizzo, MD, PhD, both of Mass General Brigham and the ACR’s Data Science Institute.  

Dreyer and Bizzo directly addressed the question of post-market AI surveillance, discussing ongoing efforts to track AI performance, including … 

The Takeaway

Last week’s DHAC meeting offers a fascinating glimpse at the issues the FDA is wrestling with as it contemplates stronger regulation of generative AI. Fortunately, radiology has blazed a trail in setting up structures like ARCH-AI and Assess-AI to monitor AI performance, and the FDA is likely to follow the specialty’s lead as it develops a regulatory framework.

Low-Dose CT Confounds CAD in Kids

When it comes to pediatric CT scans, clinicians should make every effort to reduce dose as much as possible. But a new study in AJR indicates that lower CT radiation dose can affect the performance of software tools like computer-aided detection. 

Initiatives like the Image Wisely and Image Gently projects have succeeded in raising awareness of radiation dose and have helped radiologists find ways to reduce it.

But every little bit counts in pediatric dose reduction, especially given that one CT exam can raise the risk of developing cancer by 0.35%. 

  • Imaging tools like AI and CAD could help, but there have been few studies examining the performance of pulmonary CAD software developed for adults in analyzing scans of children.

To address that gap, researchers including radiologists from Cincinnati Children’s Hospital Medical Center investigated the performance of two open-source CAD algorithms trained on adults for detecting lung nodules in 73 patients with a mean age of 14.7 years. 

  • The algorithms included FlyerScan, a CAD developed by the authors, and MONAI, an open-source project for deep learning in medical imaging. 

Scans were acquired at standard-dose (mean effective dose=1.77 mSv) and low-dose (mean effective dose=0.32 mSv) levels, with the results showing that both algorithms turned in lower performance at lower radiation dose for nodules 3-30 mm … 

  • FlyerScan saw its sensitivity decline (77% vs. 67%) and detected fewer 3mm lung nodules (33 vs. 24).
  • MONAI also saw lower sensitivity (68% vs. 62%) and detected fewer 3mm lung nodules (16 vs. 13).
  • Reduced sensitivity was more pronounced for nodules less than 5 mm.

The findings should be taken with a grain of salt, as the open-source algorithms were not originally trained on pediatric data.

  • But the results do underscore the challenge in developing image analysis software optimized for pediatric applications.

The Takeaway

With respect to low radiation dose and high AI accuracy in CT scans of kids, radiologists may not be able to have their cake and eat it too – yet. More work will be needed before AI solutions developed for adults can be used in children.

Mammography AI Predicts Cancer Before It’s Detected

A new study highlights the predictive power of AI for mammography screening – before cancers are even detected. Researchers in a study JAMA Network Open found that risk scores generated by Lunit’s Insight MMG algorithm predicted which women would develop breast cancer – years before radiologists found it on mammograms. 

Mammography image analysis has always been one of the most promising use cases for AI – even dating back to the days of computer-aided detection in the early 2000s. 

  • Most mammography AI developers have focused on helping radiologists identify suspicious lesions on mammograms, or triage low-risk studies so they don’t require extra review.

But a funny thing has happened during clinical use of these algorithms – radiologists found that AI-generated risk scores appeared to predict future breast cancers before they could be seen on mammograms. 

  • Insight MMG marks areas of concern and generates a risk score of 0-100 for the presence of breast cancer (higher numbers are worse). 

Researchers decided to investigate the risk scores’ predictive power by applying Insight MMG to screening mammography exams acquired in the BreastScreen Norway program over three biennial rounds of screening from 2004 to 2018. 

  • They then correlated AI risk scores to clinical outcomes in exams for 116k women for up to six years after the initial screening round.

Major findings of the study included … 

  • AI risk scores were higher for women who later developed cancer, 4-6 years before the cancer was detected.
  • The difference in risk scores increased over three screening rounds, from 21 points in the first round to 79 points in the third round.
  • Risk scores had very high accuracy by the third round (AUC=0.93).
  • AI scores were more accurate than existing risk tools like the Tyrer-Cuzick model.

How could AI risk scores be used in clinical practice? 

  • Women without detectable cancer but with high scores could be directed to shorter screening intervals or screening with supplemental modalities like ultrasound or MRI.

The Takeaway
It’s hard to overstate the significance of the new results. While AI for direct mammography image interpretation still seems to be having trouble catching on (just like CAD did), risk prediction is a use case that could direct more effective breast screening. The study is also a major coup for Lunit, continuing a string of impressive clinical results with the company’s technology.

AI Recon Cuts CT Radiation Dose

Artificial intelligence got its start in radiology as a tool to help medical image interpretation, but much of AI’s recent progress is in data reconstruction: improving images before radiologists even get to see them. Two new studies underscore the potential of AI-based reconstruction to reduce CT radiation dose while preserving image quality. 

Radiology vendors and clinicians have been remarkably successful in reducing CT radiation dose over the past two decades, but there’s always room for improvement. 

  • In addition to adjusting CT scanning protocols like tube voltage and current, data reconstruction protocols have been introduced to take images acquired at lower radiation levels and “boost” them to look like full-dose images. 

The arrival of AI and other deep learning-based technologies has turbocharged these efforts. 

They compared DLIR operating at high strength to GE’s older ASiR-V protocol in CCTA scans with lower tube voltage (80 kVp), finding that deep learning reconstruction led to …

  • 42% reduction in radiation dose (2.36 mSv vs. 4.07)
  • 13% reduction in contrast dose (50 mL vs. 58 mL).
  • Better signal- and contrast-to-noise ratios.
  • Higher image quality ratings.

In the second study, researchers from China including two employees of United Imaging Healthcare used a deep learning reconstruction algorithm to test ultralow-dose CT scans for coronary artery calcium scoring. 

  • They wanted to see if CAC scoring could be performed with lower tube voltage and current (80 kVp/20 mAs) and how the protocol compared to existing low-dose scans.

In tests with 156 patients, they found the ultralow-dose protocol produced …

  • Lower radiation dose (0.09 vs. 0.49 mSv).
  • No difference in CAC scoring or risk categorization. 
  • Higher contrast-to-noise ratio.

The Takeaway

AI-based data reconstruction gives radiologists the best of both worlds: lower radiation dose with better-quality images. These two new studies illustrate AI’s potential for lowering CT dose to previously unheard-of levels, with major benefits for patients.

AI Detects Interval Cancer on Mammograms

In yet another demonstration of AI’s potential to improve mammography screening, a new study in Radiology shows that Lunit’s Insight MMG algorithm detected nearly a quarter of interval cancers missed by radiologists on regular breast screening exams. 

Breast screening is one of healthcare’s most challenging cancer screening exams, and for decades has been under attack by skeptics who question its life-saving benefit relative to “harms” like false-positive biopsies.  

  • But AI has the potential to change the cost-benefit equation by detecting a higher percentage of early-stage cancers and improving breast cancer survival rates. 

Indeed, 2024 has been a watershed year for mammography AI. 

U.K. researchers used Insight MMG (also used in the BreastScreen Norway trial) to analyze 2.1k screening mammograms, of which 25% were interval cancers (cancers occurring between screening rounds) and the rest normal. 

  • The AI algorithm generates risk scores from 0-100, with higher scores indicating likelihood of malignancy, and this study was set at a 96% specificity threshold, equivalent to the average 4% recall rate in the U.K. national breast screening program.

In analyzing the results, researchers found … 

  • AI flagged 24% of the interval cancers and correctly localized 77%.
  • AI localized a higher proportion of node-positive than node-negative cancers (24% vs. 16%).
  • Invasive tumors had higher median risk scores than noninvasive (62 vs. 33), with median scores of 26 for normal mammograms.

Researchers also tested AI at a lower specificity threshold of 90%. 

  • AI detected more interval cancers at this level, but in real-world practice this would bump up recall rates.  

It’s also worth noting that Insight MMG is designed for the analysis of 2D digital mammography, which is more common in Europe than DBT. 

  • For the U.S., Lunit is emphasizing its recently cleared Insight DBT algorithm, which may perform differently.  

The Takeaway

As with the MASAI and BreastScreen Norway results, the new study points to an exciting role for AI in making mammography screening more accurate with less drain on radiologist resources. But as with those studies, the new results must be interpreted against Europe’s double-reading paradigm, which differs from the single-reading protocol used in the U.S. 

FDA Keeps Pace on AI Approvals

The FDA has updated its list of AI- and machine learning-enabled medical devices that have received regulatory authorization. The list is a closely watched barometer of the health of the AI sector, and the update shows the FDA is keeping a brisk pace of authorizations.

The FDA has maintained double-digit growth of AI authorizations for the last several years, a pace that reflects the growing number of submissions it’s getting from AI developers. 

  • Indeed, data compiled by regulatory expert Bradley Merrill Thompson show how the number of FDA authorizations has been growing rapidly since the dawn of the medical AI era in around 2016 (see also our article on AI safety below). 

The new FDA numbers show that …

  • The FDA has now authorized 950 AI/ML-enabled devices since it began keeping track
  • Device authorizations are up 15% for the first half of 2024 compared to the same period the year before (107 vs. 93)
  • The pace could grow even faster in late 2024 – in 2023, FDA in the second half authorized 126 devices, up 35% over the first half
  • At that pace, the FDA should hit just over 250 total authorizations in 2024 
  • This would represent 14% growth over 220 authorizations in 2023, and compares to growth of 14% in 2022 and 15% in 2021
  • As with past updates, radiology makes up the lion’s share of AI/ML authorizations, but had a 73% share in the first half, down from 80% for all of 2023
  • Siemens Healthineers led in all H1 2024 clearances with 11, bringing its total to 70 (66 for Siemens and four for Varian). GE HealthCare remains the leader with 80 total clearances after adding three in H1 2024 (GE’s total includes companies it has acquired, like Caption Health and MIM Software). There’s a big drop off after GE and Siemens, including Canon Medical (30), Aidoc (24), and Philips (24).

The FDA’s list includes both software-only algorithms as well as hardware devices like scanners that have built-in AI capabilities, such as a mobile X-ray unit that can alert users to emergent conditions. 

  • Indeed, many of the authorizations on the FDA’s list are for updated versions of already-cleared products rather than brand-new solutions – a trend that tends to inflate radiology’s share of approvals.

The Takeaway

The new FDA numbers on AI/ML regulatory authorizations are significant not only for revealing the growth in approvals, but also because the agency appears to be releasing the updates more frequently – perhaps a sign it is practicing what it preaches when it comes to AI openness and transparency. 

Better Prostate MRI with AI

A homegrown AI algorithm was able to detect clinically significant prostate cancer on MRI scans with the same accuracy as experienced radiologists. In a new study in Radiology, researchers say the algorithm could improve radiologists’ ability to detect prostate cancer on MRI, with fewer false positives.

In past issues of The Imaging Wire, we’ve discussed the need to improve on existing tools like PSA tests to make prostate cancer screening more precise with fewer false positives and less need for patient work-up.

  • Adding MRI to prostate screening protocols is a step forward, but MRI is an expensive technology that requires experienced radiologists to interpret.

Could AI help? In the new study, researchers tested a deep learning algorithm developed at the Mayo Clinic to detect clinically significant prostate cancer on multiparametric (mpMRI) scans.

  • In an interesting wrinkle, the Mayo algorithm does not indicate tumor location, so a second algorithm – called Grad-CAM – was employed to localize tumors.

The Mayo algorithm was trained on a population of 5k patients with a cancer prevalence similar to a screening population, then tested in an external test set of 204 patients, finding …

  • No statistically significant difference in performance between the Mayo algorithm and radiologists based on AUC (0.86 vs. 0.84, p=0.68)
  • The highest AUC was with the combination of AI and radiologists (0.89, p<0.001)
  • The Grad-CAM algorithm was accurate in localizing 56 of 58 true-positive exams

An editorial noted that the study employed the Mayo algorithm on multiparametric MRI exams.

  • Prostate cancer imaging is moving from mpMRI toward biparametric MRI (bpMRI) due to its faster scan times and lack of contrast, and if validated on bpMRI, AI’s impact could be even more dramatic.

The Takeaway
The current study illustrates the exciting developments underway to make prostate imaging more accurate and easier to perform. They also support the technology evolution that could one day make prostate cancer screening a more widely accepted test.

US + Mammo vs. Mammo + AI for Dense Breasts

Artificial intelligence may represent radiology’s future, but for at least one clinical application traditional imaging seems to be the present. In a new study in Radiology, ultrasound was more effective than AI for supplemental imaging of women with dense breast tissue. 

Dense breast tissue has long presented problems for breast imaging specialists. 

  • Women with dense breasts are at higher risk of breast cancer, but traditional screening modalities like X-ray mammography don’t work very well (sensitivity of 30-48%), creating the need for supplemental imaging tools like ultrasound and MRI.

In the new study, researchers from South Korea tested the use of Lunit’s Insight MMG mammography AI algorithm in 5.7k women without symptoms who had breast tissue classified as heterogeneously (63%) or extremely dense (37%). 

  • AI’s performance was compared to both mammography alone as well as to mammography with ultrasound, one of the gold-standard modalities for imaging women with dense breasts. 

All in all, researchers found …

  • Mammography with AI had lower sensitivity than mammography with ultrasound but slightly better than mammography alone (61% vs. 97% vs. 58%)
  • Mammography with AI had a lower cancer detection rate per 1k women but higher than mammography alone (3.5 vs. 5.6 vs. 3.3)
  • Mammography with AI missed 12 cancers detected with mammography with ultrasound
  • Mammography with AI had the highest specificity (95% vs. 78% vs. 94%)
  • And the lowest abnormal interpretation rate (5% vs. 23% vs. 6%)

The results show that while AI can help radiologists interpret screening mammography for most women, at present it can’t compensate for mammography’s low sensitivity in women with dense breast tissue.

In an editorial, breast radiologists Gary Whitman, MD, and Stamatia Destounis, MD, observed that supplemental imaging of women with dense breasts is getting more attention as the FDA prepares to implement breast density notification rules in September. 

  • They recommended follow-up studies with other AI algorithms, more patients, and a longer follow-up period. 

The Takeaway

As with a recent study on AI and teleradiology, the current research is a good step toward real-world evaluation of AI for a specific use case. While AI in this instance didn’t improve mammography’s sensitivity in women with dense breast tissue, it could carve out a role reducing false positives for these women who get mammography and ultrasound.

Teleradiology AI’s Mixed Bag

An AI algorithm that examined teleradiology studies for signs of intracranial hemorrhage had mixed performance in a new study in Radiology: Artificial Intelligence. AI helped detect ICH cases that might have been missed, but false positives slowed radiologists down. 

AI is being touted as a tool that can detect unseen pathology and speed up the workflow of radiologists facing an environment of limited resources and growing image volume.

  • This dynamic is particularly evident at teleradiology practices, which frequently see high volumes during off-hour shifts; indeed, a recent study found that telerad cases had higher rates of patient death and more malpractice claims than cases read by traditional radiology practices.

So teleradiologists could use a bit more help. In the new study, researchers from the VA’s National Teleradiology Program assessed Avicenna.ai’s CINA v1.0 algorithm for detecting ICH on STAT non-contrast head CT studies.

  • AI was used to analyze 58.3k CT exams processed by the teleradiology service from January 2023 to February 2024, with a 2.7% prevalence of ICH.

Results were as follows

  • AI flagged 5.7k studies as positive for acute ICH and 52.7k as negative
  • Final radiology reports confirmed that 1.2k exams were true positives for a sensitivity of 76% and a positive predictive value of 21%
  • There were 384 false negatives (missed ICH cases), for a specificity of 92% and a negative predictive value of 99.3%
  • The algorithm’s performance at the VA was a bit lower than in previously published literature
  • Cases that the algorithm falsely flagged as positive took over a minute longer to interpret than prior to AI deployment
  • Overall, case interpretation times were slightly lower after AI than before

One issue to note is that the CINA algorithm is not intended for small hemorrhages with volumes < 3 mL; the researchers did not exclude these cases from their analysis, which could have reduced its performance.

  • Also, at 2.7% the VA’s teleradiology program ICH prevalence was lower than the 10% prevalence Avicenna has used to rate its performance.

The Takeaway

The new findings aren’t exactly a slam dunk for AI in the teleradiology setting, but in terms of real-world results they are exactly what’s needed to assess the true value of the technology compared to outcomes in more tightly controlled environments.

Top 6 Radiology Trends of 2024’s First Half

You can put the first half of 2024 in the books … and it was full of major developments for radiology. What follows are the top six trends in medical imaging – one for each month of the first half.

  • The Rise of AI for Breast Screening – The first half of 2024 saw the publication of studies conducted in Norway and Denmark that underlined the potential role of AI for breast screening, particularly for ruling out exams most likely to be normal. But research conducted within Europe’s paradigm of double-reading workflow for 2D mammograms may not be so relevant in the US, and more studies are needed.
  • Mammography Guideline Controversy – Changes to breast screening guidelines in both the US and Canada were first-half headlines. In the US, the USPSTF made official its proposal to lower to 40 the recommended age to start screening, but many were disappointed it failed to provide stronger guidance on dense breast screening. Things were even worse in Canada, where a federal task force declined to lower the screening age from 50 to 40. Canadian advocates have vowed to fight on at the provincial level. 
  • AI Funding Pullback Continues – The ongoing pullback in venture capital funding for AI developers continues. A study by Signify Research found that not only did VC funding fall 19% in 2023, but it got off to a slow start in 2024 as well. The new environment could be putting more pressure on AI firms to demonstrate ROI to both healthcare providers and investors, while also having broader implications – a major AI conference rescheduled a show that had been on the calendar for May, citing “market conditions.” On the positive side, Tempus AI’s IPO boomed, raising $412M
  • Opportunistic Screening Gains Steam – The concept of opportunistic screening – detecting pathology on medical images acquired for other indications – has been around for a while. But it’s only really started to catch on with the development of AI algorithms that can process thousands of images without a radiologist’s involvement. The first half of 2024 saw publication of several exciting studies for indications including detecting osteoporosis, scoring coronary artery calcifications, and predicting major adverse cardiac events
  • ChatGPT Frenzy Subsides – The frenzied interest in ChatGPT and other generative AI large language models seen throughout 2023 seemed to subside in the first half of 2024. A quick search of The Imaging Wire archives, for example, finds just four references to ChatGPT in the first six months of 2024 compared to 21 citations at the same point in 2023. LLM developers need to address major issues – from GenAI’s “hallucination effect” to potential misuse of the technology – before LLMs can be used in clinical settings.

The Takeaway

The midpoint of the year is a great time to take stock of radiology’s progress and the issues that have bubbled to the surface over the past six months. In 2024’s back half, look for renewed attention on breast screening as the FDA’s density reporting rules go into effect in September, and keep on the lookout for signs that real-world AI adoption is growing, even as AI developers look for consolidation opportunities.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!