Intracranial Hemorrhage AI Efficiency

A new Radiology: Artificial Intelligence study out of Switzerland highlighted how Aidoc’s Intracranial Hemorrhage AI solution improved emergency department workflows, without hurting patient care. Even if that’s exactly what solutions like this are supposed to do, real world AI studies that go beyond sensitivity and specificity are still rare and worth some extra attention.

The Study – The researchers analyzed University Hospital of Basel’s non-contrast CT intracranial hemorrhage (ICH) exams before and after adopting the Aidoc ICH solution (n = 1,433 before & 3,017 after; ~14% ICH incidence w/ both groups).

Diagnostic Results – The Aidoc solution produced “practicable” overall diagnostic results (93% accuracy, 87.2% sensitivity, 93.9% specificity, and 97.8% NPV), although accuracy was lower with certain ICH subtypes (e.g. subdural hemorrhage 69.2%, 74/107). 

Efficiency Results – More notably, the Aidoc ICH solution “positively impacted” UBS’ ED workflows, with improvements across a range of key metrics:

  • Communicating critical findings: 63 vs. 70 minutes
  • Communicating acute ICH: 58 vs. 73 minutes
  • Overall turnaround time to rule out ICH: 164 vs. 175 minutes
  • Turnaround time to rule out ICH during working hours: 167 vs. 205 minutes

Next Steps – The authors called for further efforts to streamline their stroke workflows and to create a clear ICH AI framework, accurately noting that “AI tools are only as reliable as the environment they are deployed in.”

The Takeaway
The internet hasn’t always been kind to emergency AI tools, and academic studies have rarely focused on the workflow efficiency outcomes that many radiologists and emergency teams care about. That’s not the case with this study, which did a good job showing the diagnostic and workflow upsides of ICH AI adoption, and added a nice reminder that imaging teams share responsibility for AI outcomes.

Creating a Cancer Screening Giant

A few days after shocking the AI and imaging center industries with its acquisitions of Aidence and Quantib, RadNet’s Friday investor briefing revealed a far more ambitious AI-enabled cancer screening strategy than many might have imagined.

Expanding to Colon Cancer – RadNet will complete its AI screening platform by developing a homegrown colon cancer detection system, estimating that its four AI-based cancer detection solutions (breast, prostate, lung, colon) could screen for 70% of cancers that are imaging-detectable at early stages.

Population Detection – Once its AI platform is complete, RadNet plans to launch a strategy to expand cancer screening’s role in population health, while making prostate, lung, and colon cancer screening as mainstream as breast cancer screening.

Becoming an AI Vendor – RadNet revealed plans to launch an externally-focused AI business that will lead with its multi-cancer AI screening platform, but will also create opportunities for RadNet’s eRAD PACS/RIS software. There are plenty of players in the AI-based cancer detection arena, but RadNet’s unique multi-cancer platform, significant funding, and training data advantage would make it a formidable competitor.

Geographic Expansion – RadNet will leverage Aidence and Quantib’s European presence to expand its software business internationally, as well as into parts of the US where RadNet doesn’t own imaging centers (RadNet has centers in just 7 states).

Imaging Center Upsides – RadNet’s cancer screening AI strategy will of course benefit its core imaging center business. In addition to improving operational efficiency and driving more cancer screening volumes, RadNet believes that the unique benefits of its AI platform will drive more hospital system joint ventures.

AI Financials – The briefing also provided rare insights into AI vendor finances, revealing that DeepHealth has been running at a $4M-$5M annual loss and adding Aidence / Quantib might expand that loss to $10M- $12M (seems OK given RadNet’s $215M EBITDA). RadNet hopes its AI division will become cash flow neutral within the next few years as revenue from outside companies ramp up.

The Takeaway

RadNet has very big ambitions to become a global cancer screening leader and significantly expand cancer screening’s role in society. Changing society doesn’t come fast or easy, but a goal like that reveals how much emphasis RadNet is going to place on developing and distributing its AI cancer screening platform going forward.

Duke’s Interpretable AI Milestone

A team of Duke University radiologists and computer engineers unveiled a new mammography AI platform that could be an important step towards developing truly interpretable AI.

Explainable History – Healthcare leaders have been calling for explainable imaging AI for some time, but explainability efforts have been mainly limited to saliency / heat maps that show what part of an image influenced a model’s prediction (not how or why).

Duke’s Interpretable Model – Duke’s new AI platform analyzes mammography exams for potentially cancerous lesions to help physicians determine if a patient should receive a biopsy, while supporting its predictions with image and case-based explanations. 

Training Interpretability – The Duke team trained their AI platform to locate and evaluate lesions following a process that human radiology educators and students would utilize:

  • First, they trained the AI model to detect suspicious lesions and to ignore healthy tissues
  • Then they had radiologists label the edges of the lesions
  • Then they trained the model to compare those lesion edges with lesion edges from an archive of images with confirmed outcomes

Interpretable Predictions – This training process allowed the AI model to identify suspicious lesions, highlight the classification-relevant parts of the image, and explain its findings by referencing confirmed images. 

Interpretable Results – Like many AI models, this early version could not identify cancerous lesions as accurately as human radiologists. However, it matched the performance of existing “black box” AI systems and the team was able to see why their AI model made its mistakes.

The Takeaway

It seems like concerns over AI performance are growing at about the same pace as actual AI adoption, making explainability / interpretability increasingly important. Duke’s interpretable AI platform might be in its early stages, but its use of previous cases to explain findings seems like a promising (and straightforward) way to achieve that goal, while improving diagnosis in the process.

The False Hope of Explainable AI

Many folks view explainability as a crucial next step for AI, but a new Lancet paper from a team of AI heavyweights argues that explainability might do more harm than good in the short-term, and AI stakeholders would be better off increasing their focus on validation.

The Old Theory – For as long as we’ve been covering AI, really smart and well-intentioned people have warned about the “black-box” nature of AI decision making and forecasted that explainable AI will lead to more trust, less bias, and greater adoption.

The New Theory – These black-box concerns and explainable AI forecasts might be logical, but they aren’t currently realistic, especially for patient-level decision support. Here’s why:

  • Explainability methods describe how AI systems work, not how decisions are made
  • AI explanations can be unreliable and/or superficial
  • Most medical AI decisions are too complex to explain in an understandable way
  • Humans over-trust computers, so explanations can hurt their ability to catch AI mistakes
  • AI explainability methods (e.g heat maps) require human interpretation, risking confirmation bias
  • Explainable AI adds more potential error sources (AI tool + AI explanation + human interpretation)
  • Although we still can’t fully explain how acetaminophen works, we don’t question whether it works, because we’ve tested it extensively

The Explainability Alternative – Until suitable explainability methods emerge, the authors call for “rigorous internal and external validation of AI models” to make sure AI tools are consistently making the right recommendations. They also advised clinicians to remain cautious when referencing AI explanations and warned that policymakers should resist making explainability a requirement. 

Explability’s Short-Term Role – Explainability definitely still has a role in AI safety, as it’s “incredibly useful” for model troubleshooting and systems audits, which can improve model performance and identify failure modes or biases.

The Takeaway – It appears we might not be close enough to explainable AI to make it a part of short-term AI strategies, policies, or procedures. That might be hard to accept for the many people who view the need for AI explainability as undebatable, and it makes AI validation and testing more important than ever.

Who Owns AI Evaluation and Monitoring?

Imaging AI evaluation and monitoring just became even hotter topics, following a particularly revealing Twitter thread and a pair of interesting new papers.

Rads Don’t Work for AI – A Mayo Clinic Florida neuroradiologist took his case to Twitter after an FDA-approved AI tool missed 6 of 7 hemorrhages in a single shift and he was asked to make extra clicks to help improve the algorithm. No AI tool is perfect, but many folks commenting on this thread didn’t take kindly to the idea of being asked to do pro-bono work to improve an algorithm that they already paid for. 

AI Takes Work – A few radiologists with strong AI backgrounds clarified that this “extra work” is intended to inform the FDA about postmarket performance, while monitoring healthcare tools and providing feedback is indeed physicians’ job. They also argued that radiology practices should ensure that they have the bandwidth to monitor AI before deciding to adopt it.

The ACR DSI Gets It – Understanding that “AI algorithms may not work as expected when used beyond the institutions in which they were trained, and model performance may degrade over time” the ACR Data Science Institute (DSI) released a helpful paper detailing how radiologists can evaluate AI before and during clinical use. In an unplanned nod to the above Twitter thread, the DSA paper also noted that AI evaluation/monitoring is “ultimately up to the end users” although many “practices will not be able to do this on their own.” The good news is the ACR DSI is developing tools to help them.

DLIR Needs Evaluation Too – Because measuring whether DL-reconstructed scans “look good” or allow reduced dosage exams won’t avoid errors (e.g. false tumors or removed tumors), a Washington University in St. Louis-led team is developing a framework for evaluating DLIR tools before they are introduced into clinical practice. The new framework comes from some big-name intuitions (WUSTL, NIH, FDA, Cleveland Clinic, UBC), all of whom also appear to agree that AI evaluation is up to the users.

The Takeaway – At least among AI insiders it’s clear that AI users are responsible for algorithm evaluation and monitoring, even if bandwidth is limited and many evaluation/monitoring tools are still being developed. Meanwhile, many AI users (who are crucial for AI to become mainstream) want their FDA-approved algorithms to perform correctly and aren’t eager to do extra work to help improve them. That’s a pretty solid conflict, but it’s also a silver lining for AI vendors who get good at streamlining evaluations and develop low-labor ways to monitor performance.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!