Duke’s Interpretable AI Milestone

A team of Duke University radiologists and computer engineers unveiled a new mammography AI platform that could be an important step towards developing truly interpretable AI.

Explainable History – Healthcare leaders have been calling for explainable imaging AI for some time, but explainability efforts have been mainly limited to saliency / heat maps that show what part of an image influenced a model’s prediction (not how or why).

Duke’s Interpretable Model – Duke’s new AI platform analyzes mammography exams for potentially cancerous lesions to help physicians determine if a patient should receive a biopsy, while supporting its predictions with image and case-based explanations. 

Training Interpretability – The Duke team trained their AI platform to locate and evaluate lesions following a process that human radiology educators and students would utilize:

  • First, they trained the AI model to detect suspicious lesions and to ignore healthy tissues
  • Then they had radiologists label the edges of the lesions
  • Then they trained the model to compare those lesion edges with lesion edges from an archive of images with confirmed outcomes

Interpretable Predictions – This training process allowed the AI model to identify suspicious lesions, highlight the classification-relevant parts of the image, and explain its findings by referencing confirmed images. 

Interpretable Results – Like many AI models, this early version could not identify cancerous lesions as accurately as human radiologists. However, it matched the performance of existing “black box” AI systems and the team was able to see why their AI model made its mistakes.

The Takeaway

It seems like concerns over AI performance are growing at about the same pace as actual AI adoption, making explainability / interpretability increasingly important. Duke’s interpretable AI platform might be in its early stages, but its use of previous cases to explain findings seems like a promising (and straightforward) way to achieve that goal, while improving diagnosis in the process.

AI Disparity Detection

Most studies involving imaging AI and patient race/ethnicity warn that AI might exacerbate healthcare inequalities, but a new JACR study outlines one way that imaging AI could actually improve care for typically underserved patients.

The AI vs. EHR Disparity Problem – The researchers used a deep learning model to detect atherosclerotic disease in CXRs from two cohorts of COVID-positive patients: 814 patients from a suburban ambulatory center (largely White, higher-income) and 485 patients admitted at an inner-city hospital (largely minority, lower-income). 

When they validated the AI predictions versus the patients’ EHR codes they found that:

  • The AI predictions were far more likely to match the suburban patients’ EHR codes than the inner-city patients’ EHR codes (0.85 vs. 0.69 AUCs)
  • AI/EHR discrepancies were far more common among patients who were Black or Hispanic, prefer a non-English language, and live in disadvantaged zip codes

The Imaging AI Solution – This study suggests healthcare systems could use imaging AI-based biomarkers and EHR data to flag patients that might have undiagnosed conditions, allowing them to get these patients into care and identify/address larger systemic barriers. 

The Value-Based Care Justification – In addition to healthcare ethics reasons, the authors noted that imaging/EHR discrepancy detection could become increasingly financially important as we transition to more value-based models. AI/EHR analytics like this could be used to ensure at-risk patients are cared for as early as possible, healthcare disparities are addressed, and value-based metrics are achieved.

The Takeaway – Over the last year we’ve seen population health incidental detection emerge as one of the most exciting imaging AI use cases, while racial/demographic bias emerged as one of imaging AI’s most troubling challenges. This study managed to combine these two topics to potentially create a new way to address barriers to care, while giving health systems another tool to ensure they’re delivering value-based care.

MaxQ AI Shuts Down

The Imaging Wire recently learned that MaxQ AI has stopped commercial operations, representing arguably the biggest consolidation event in imaging AI’s young history.

About MaxQ AI – The early AI trailblazer (founded in 2013) is best known for its Accipio ICH & Stroke triage platform and its solid list of channel partners (Philips, Fujifilm, IBM, Blackford, Nuance, and a particularly strong alliance w/ GE). 

About the Shutdown – MaxQ has officially stopped commercial operations and let go of its sales and marketing workforce. However, it’s unclear whether MaxQ AI is shutting down completely, or if this is part of a strategic pivot or asset sale.

Shutdown Impact – MaxQ’s commercial shutdown leaves its Accipio channel partners and healthcare customers without an ICH AI product (or at least one fewer ICH product), while creating opportunities for its competitors to step in (e.g., Qure.ai, Aidoc, Avicenna.ai). 

A Consolidation Milestone – MaxQ AI’s commercial exit represents the first of what could prove to be many AI vendor consolidations, as larger AI players grow more dominant and funding runways become shorter. In fact, MaxQ AI might fit the profile of the type of AI startups facing the greatest consolidation threat, given that it operated within a single highly-competitive niche (at least six ICH AI vendors) that’s been challenged to improve detection without slowing radiologist workflows. 

The Takeaway – It’s never fun covering news like this, but MaxQ AI’s commercial shutdown is definitely worth the industry’s attention. The fact is, consolidation happens in every industry and it could soon play a larger role in imaging AI.

Note: MaxQ AI’s shutdown unfortunately leaves some nice, talented, and experienced imaging professionals out of a job. Imaging Wire readers who are building their AI teams, should consider reaching out to these folks.

Detecting the Radiographically Occult

A new study published in European Heart Journal – Digital Health suggests that AI can detect aortic stenosis (AS) in chest X-rays, which would be a major breakthrough if confirmed, but will be met with plenty of skepticism until then.

The Models – The Japan-based research team trained/validated/tested three DL models using 10,433 CXRs from 5,638 patients (all from the same institution), using echocardiography assessments to label each image as AS-positive or AS-negative.

The Results – The best performing model detected AS-positive patients with an 0.83 AUC, while achieving 83% sensitivity, 69% specificity, 71% accuracy, and a 97% negative predictive value (but… a 23% PPV). Given the widespread use and availability of CXRs, these results were good enough for the authors to suggest that their DL model could be a valuable way to detect aortic stenosis.

The Response – The folks on radiology/AI Twitter found these results “hard to believe,” given that human rads can’t detect aortic stenosis in CXRs with much better accuracy than a coin flip, and considering that these models were only trained/validated/tested with internal data. The conversation also revealed a growing level of AI study fatigue that will likely become worse if journals don’t start enforcing higher research standards (e.g. external validation, mentioning confounding factors, addressing the 23% PPV, maybe adding an editorial).

The Takeaway – Twitter’s MDs and PhDs love to critique study methodology, but this thread was a particularly helpful reminder of what potential AI users are looking for in AI studies — especially studies that claim AI can detect a condition that’s barely detectable by human experts.

Trained to Underdiagnose

A new Nature study suggests that imaging AI models might underdiagnose patient populations who are also underdiagnosed in the real world, revealing new ethical and clinical challenges for AI development, regulation, and adoption.

The Study – The researchers trained four AI models to predict whether images would have positive diagnostic findings using three large/diverse public CXR datasets (one model w/ each dataset, one w/ combined dataset, 707k total images). They then analyzed model performance across various patient populations.

The Underdiagnosed – The AI models were mostly likely to underdiagnose patients who are female, young (0-20yrs), Hispanic and Black, and covered by Medicaid (low-income). AI underdiagnosis rates were even more extreme among patients who belonged to multiple underserved groups, such as Hispanic females or younger Black patients.

The Overdiagnosed – As you might expect, healthy patients who were incorrectly flagged by the AI models as unhealthy were usually male, older, White, and higher income.

The Clinical Impact – In clinical use, a model like this would result in traditionally underserved patients experiencing more missed diagnoses and delayed treatments, while traditionally advantaged patients might undergo more unnecessary tests and treatments. And we know from previous research that AI can independently detect patient race in scans (even if we don’t know why).

The Takeaway – AI developers have been working to reduce racial/social bias in their models by using diverse datasets, but it appears that they could be introducing more systemic biases in the process (or even amplifying them). These biases certainly aren’t AI developers’ fault, but they still add to the list of data source problems that developers will have to solve.

The State of AI

A group of radiology leaders starred in Canon Medical’s recent State of AI in Radiology Today Roundtable, sharing insights into how imaging AI is being used, where it’s needed most, and how AI might assume a core role in medical imaging.

The panelists were largely from the user/clinical side of imaging (U of Maryland’s Eliot Siegel, MD; UC Irvine’s Peter Chang, MD; UHS Delaware’s Cindy Siegel, CRA; U of Toronto’s Patrik Rogalla, MD; and Canon’s Director of Healthcare Economics Tom Szostak), with deeper AI experience than many typical radiology team members.

Here are some of the big takeaways:

We’re Still Early – The panel started by making sure everyone agrees on the definition of AI and much of ensuing discussions focused on AI’s future potential, which says a lot about where we are in AI’s lifecycle.

Do We Need AI? – The panelists agreed that radiology does indeed need AI, largely because it can improve the patient experience (shorter scans, faster results, fewer call-backs), help solve radiology’s inefficiency problems, and improve diagnostic accuracy.

Does AI Really Improve Efficiency? – Outside of image reconstruction, none of the panelists were ready to say that AI currently makes radiologists faster. However, they still believe that AI will improve future radiology workflows and outcomes.

Finding The Killer App – Things got a lot more theoretical at the halfway point, when the conversation shifted to what “killer apps” might bring imaging AI into mainstream use, including AI tools that:

  • Identify and exclude normal scans with extremely high accuracy (must be far more accurate than humans and limit false positives)
  • Curate and submit all CMS quality reporting metrics (eliminates admin work, generates revenue)
  • Identify early-stage diseases for population health programs (keeps current diagnostic workflows intact)
  • Interpret and diagnose all X-ray exams (eliminates high volume/repetitive exams, rads don’t read some XRs in many countries)
  • Improve image quality, allow faster scans, reduce dosage (aka DL image reconstruction)

AI’s Radiologist Impact – The panelists don’t see AI threatening radiologist jobs in the short to mid-term given AI’s current immaturity, the “tremendous inefficiencies” that still exist in radiology, and the pace of imaging volume growth. They also expect volume growth to drive longer term demand for both AI and rads, suggesting that AI adoption might even amplify future volume growth (if AI expands bandwidth and cuts interpretation costs, the laws of economics suggest that more scans would follow).

What AI Needs – With most of the technical parts of building algorithms now figured out, AI’s evolution will depend on getting enough training data, improving how AI is integrated into workflows, and making sure AI is solving radiology’s biggest problems. Imaging AI also needs healthcare to be open to change, which would require clear clinical, operational, and financial upsides.

Arterys’ Platform Expansion

At a time when many major AI companies are trying to become AI platform companies, Arterys announced a trio of 3rd party AI alliances that showed how a mature AI platform might work.

Arterys Expands Neuro AI – Arterys launched neuroradiology AI alliances with Combinostics and Cercare Medical, expanding Arterys’ already-comprehensive Neuro AI suite (also includes MRI brain tumor diagnostics, CT stroke & ICH detection, 2D/4D Flow brain MRI). Combinostics’ cNeuro cMRI supports multiple neurological disorder assessments (specifically dementia and multiple sclerosis), while Cercare Perfusion automates brain CT and MRI perfusion map generation and stoke treatment decision making. 

Arterys Adds Breast AI – Arterys launched a global distribution agreement with iCAD, making iCAD’s full suite of breast health solutions available in Arterys’ new Breast AI suite. iCAD’s portfolio is certainly broad enough to deserve its own “suite,” ranging from 2D and 3D mammography detection, personalized risk screening and decision support, and density assessments. The Arterys Breast AI suite also makes iCAD available as a cloud-based SaaS solution for the first time (previously only on-premise).

Arterys Platform Impact – Arterys’ integration of multiple complementary AI tools within curated AI Suites is unique and makes a lot of sense. It seems to be far more helpful to provide neurorads with integrated access to a suite of neuro AI tools, than to provide them with one or two tools for every subspecialty.

The Takeaway – Arterys’ new alliances reveal a far more subspecialist-targeted approach than we usually see from AI platforms or marketplaces. It also shows that Arterys is committed to its vendor neutral strategy, effectively doubling its number of active AI partners (previously: Imaging Biometrics & Avicenna.AI in Neuro suite, MILVUE in Chest MSK Suite), while highlighting the value of its cloud-native structure for integrating new partners within the same user interface.

Viz.ai’s Care Coordination Expansion

Viz.ai advanced its care coordination strategy last week, launching new Pulmonary Embolism and Aortic Disease modules, and unveiling its forthcoming Viz ANX cerebral aneurysm module.

PE & Aortic Modules – The new PE and Aortic modules use AI to quickly detect pulmonary embolisms and aortic dissection in CTA scans, and then coordinate care using Viz.ai’s 3D mobile viewer and clinical communications workflows. It appears that Viz.ai partnered with Avicenna.AI to create these modules, representing a logical way for Viz.ai to quickly expand its portfolio.

Viz ANX Module – The forthcoming Viz ANX module will use the 510k-pending Viz ANX algorithm to automatically detect suspected cerebral aneurysms in CTAs, and then leverage the Viz Platform for care coordination.

Viz.ai’s Care Coordination Strategy – Viz.ai called itself “the leader in AI-powered care coordination” a total of six times in these two announcements, and the company has definitely earned this title for stroke detection/coordination. Adding new modules to the Viz Platform is how Viz.ai could earn “leadership” status across all other imaging-detected emergent conditions.

The Takeaway – Viz.ai’s stroke detection/coordination platform has been among the biggest imaging AI success stories, making its efforts to expand to new AI-based detection and care coordination areas notable (and pretty smart). These module launches are also an example of diagnostic AI’s growing role throughout care pathways, showing how AI can add clinical value beyond the reading room.

Right Diagnoses, Wrong Reasons

An AJR study shared new evidence of how X-ray image labels influence deep learning decision making, while revealing one way developers can address this issue.

Confounding History – Although already well known by AI insiders, label and laterality-based AI shortcuts made headlines last year when they were blamed for many COVID algorithms’ poor real-world performance. 

The Study – Using 40k images from Stanford’s MURA dataset, the researchers trained three CNNs to detect abnormalities in upper extremity X-rays. They then tested the models for detection accuracy and used a heatmap tool to identify the parts of the images that the CNNs emphasized. As you might expect, labels played a major role in both accuracy and decision making.

  • The model trained on complete images (bones & labels) achieved an 0.844 AUC, but based 89% of its decisions on the radiographs’ laterality/labels.
  • The model trained without labels or laterality (only bones) detected abnormalities with a higher 0.857 AUC and attributed 91% of its decision to bone features.
  • The model trained with only laterality and labels (no bones) still achieved an 0.638 AUC, showing that AI interprets certain labels as a sign of abnormalities. 

The Takeaway – Labels are just about as common on X-rays as actual anatomy, and it turns out that they could have an even greater influence on AI decision making. Because of that, the authors urged AI developers to address confounding image features during the curation process (potentially by covering labels) and encouraged AI users to screen CNNs for these issues before clinical deployment.

UCSF Automates CAC Scoring

UCSF is now using AI to automatically screen all of its routine non-contrast chest CTs for elevated coronary artery calcium scores (CAC scores), representing a major milestone for an AI use case that was previously limited to academic studies and future business strategies.

UCSF’s Deployment UCSF becomes the first medical center to deploy the end-to-end AI CAC scoring system that it developed with Stanford and Bunkerhill Health earlier this year. The new system automatically identifies elevated CAC scores in non-gated / non-contrast chest CTs, creating an “opportunistic screening pathway” that allows UCSF physicians to identify high-CAC patients and get them into treatment.

Why This is a Big Deal – Over 20m chest CTs are performed in the U.S. annually and each of those scans contains insights into patients’ cardiac health. However, an AI model like this would be required to extract cardiac data from the majority of CT scans (CAC isn’t visible to humans in non-gated CTs) and efficiently interpret them (there’s far too many images). This AI system’s path from academic research to clinical deployment seems like a big deal too.

The Commercial Impact – Most health systems don’t have the AI firepower of Stanford and UCSF, but they certainly produce plenty of chest CTs and should want to identify more high-risk patients while treatable (especially if they’re also risk holders). Meanwhile, there’s growing commercial efforts from companies like Cleerly and Nanox.AI to create opportunistic CAC screening pathways for all these health systems that can’t develop their own CAC AI workflows (or prefer not to).

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!