The Case for Algorithmic Audits

A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

The Model – The team developed their proximal femoral fracture detection DL model using 45.7k frontal X-rays performed at Australia’s Royal Adelaide Hospital (w/ 4,861 fractures).

The Validation – They then tested it against a 4,577-exam internal set (w/ 640 fractures), 400 of which were also interpreted by five radiologists (w/ 200 fractures), and against an 81-image external validation set from Stanford.

The Results – All three tests produced results that a typical study might have viewed as evidence of high-performance: 

  • The model outperformed the five radiologists (0.994 vs. 0.969 AUCs)
  • It beat the best performing radiologist’s sensitivity (95.5% vs. 94.5%) and specificity (99.5% vs 97.5%)
  • It generalized well with the external Stanford data (0.980 AUC)

The Audit – Despite the strong results, a follow-up audit revealed that the model might make some predictions for the wrong reasons, suggesting that it is unsafe for clinical deployment:

  • One false negative X-ray included an extremely displaced fracture that human radiologists would catch
  • X-rays featuring abnormal bones or joints had a 50% false negative rate, far higher than the reader set’s overall false negative rate (2.5%)
  • Salience maps showed that AI decisions were almost never based on the outer region of the femoral neck, even with images where that region was clinically relevant (but it still often made the right diagnosis)
  • The model scored a high AUC with the Stanford data, but showed a substantial model operating point shift

The Case for Auditing – Although the study might have not started with this goal, it ended up becoming an argument for more sophisticated preclinical auditing. It even led to a separate paper outlining their algorithmic auditing process, which among other things suggested that AI users and developers should co-own audits.

The Takeaway

Auditing generally isn’t the most exciting topic in any field, but this study shows that it’s exceptionally important for imaging AI. It also suggests that audits might be necessary for achieving the most exciting parts of AI, like improving outcomes and efficiency, earning clinician trust, and increasing adoption.A new Lancet Digital Health study could have become one of the many “AI rivals radiologists” papers that we see each week, but it instead served as an important lesson that traditional performance tests might not prove that AI models are actually safe for clinical use.

Imaging AI’s Big 2021

Signify Research’s latest imaging AI VC funding report revealed an unexpected surge in 2021, along with major funding shifts that might explain why many of us didn’t see it coming. Here’s some of Signify’s big takeaways and here’s where to get the full report.

AI’s Path to $3.47B – Imaging AI startups have raised $3.47B in venture funding since 2015, helped by a record-high $815M in 2021 after several years of falling investments (vs. 2020’s $592M, 2019’s $450M, 2018’s $790M).

Big Get Bigger – That $3.47B funding total came from over 200 companies and 290 deals, although the 25 highest-funded companies were responsible for 80% of all capital raised. VCs  increased their focus on established AI companies in 2021, resulting in record-high late-stage funding (~$723.5M), record-low Pre-Seed/Seed funding (~$7M), and a major increase in average deal size (~$33M vs. ~$12M in 2020). 

Made in China – If you’re surprised that 2021 was a record AI funding year, that’s probably because it targeted Chinese companies (~$260M vs. US’ ~$150M), continuing a recent trend (China’s AI VC share was 45% in 2020, 26% in 2019). We’re also seeing major funding go to South Korea and Australia’s top startups, adding to APAC AI vendors’ funding leadership.

Health VC Context – Although imaging AI’s $815M 2021 funding total seems big for a category that’s figuring out its path towards full adoption, the amount VC firms are investing in other areas of healthcare makes it seem pretty reasonable. Our two previous Digital Health Wire issues featured seven digital health startup funding rounds with a total value of $267M (and that’s from just one week).

The Takeaway

Signify correctly points out that imaging AI funding remains strong despite a list of headwinds (COVID, regulatory hurdles, lacking reimbursements), while showing more signs of AI market maturation (larger funding rounds to fewer players) and suggesting that consolidation is on the way. Those factors will likely continue in 2022. However, more innovation is surely on the way too and quite a few regional AI powerhouses still haven’t expanded globally, suggesting that the next steps in AI’s evolution won’t be as straightforward as some might think.

Autonomous AI Milestone

Just as the debate over whether AI might replace radiologists is starting to fade away, Oxipit’s ChestLink solution became the first regulatory-approved imaging AI product intended to perform diagnoses without involving radiologists (*please see editor’s note below regarding Behold.ai). That’s a big and potentially controversial milestone in the evolution of imaging AI and it’s worth a deeper look.

About ChestLink – ChestLink autonomously identifies CXRs without abnormalities and produces final reports for each of these “normal” exams, automating 15% to 40% of reporting workflows.

Automation Evidence – Oxipit has already piloted ChestLink in supervised settings for over a year, processing over 500k real-world CXRs with 99% sensitivity and no clinically relevant errors.

The Rollout – With its CE Class IIb Mark finalized, Oxipit is now planning to roll out ChestLink across Europe and begin “fully autonomous” operation by early 2023. Oxipit specifically mentioned primary care settings (many normal CXRs) and large-scale screening projects (high volumes, many normal scans) in its announcement, but ChestLink doesn’t appear limited to those use cases.

ChestLink’s ability to address radiologist shortages and reduce labor costs seem like strong and unique advantages. However, radiology’s first regulatory approved autonomous AI solution might face even stronger challenges:

  • ChestLink’s CE Mark doesn’t account for country-specific regulations around autonomous diagnostic reporting (e.g. the UK requires “appropriate reporting” with ionizing radiation-based exams)
  • Radiologist societies historically push back against anything that might undermine radiologists’ clinical roles, earning potential, and future career stability
  • Health systems’ evidence requirements for any autonomous AI tools would likely be extremely high, and they might expect similarly high economic ROI in order to justify the associated diagnostic or reputational risks
  • Even the comments in Oxipit’s LinkedIn announcement had a much more skeptical tone than we typically see with regulatory approval announcements

The Takeaway

Autonomous AI products like ChestLink could address some of radiology’s greatest problems (radiologist overwork, staffing shortages, volume growth, low access in developing countries) and their economic value proposition is far stronger than most other diagnostic AI products.

However, autonomous AI solutions could also face more obstacles than any other imaging AI products we’ve seen so far, suggesting that it would take a combination of excellent clinical performance and major changes in healthcare policies/philosophies in order for autonomous AI to reach mainstream adoption.

*Editor’s Note – April 21, 2022: Behold.ai insists that it is the first imaging AI company to receive regulatory approval for autonomous AI. Its product is used with radiologist involvement and local UK guidelines require that radiologists read exams that use ionizing radiation. All above analysis regarding the possibilities and challenges of autonomous AI applies to any autonomous AI vendor in the current AI environment, including both Oxipit and Behold.ai.

Complementary PE AI

A new European Radiology study out of France highlighted how Aidoc’s pulmonary embolism AI solution can serve as a valuable emergency radiology safety net, catching PE cases that otherwise might have been missed and increasing radiologists’ confidence. 

Even if that’s technically what PE AI products are supposed to do, studies using commercially available products and focusing on how AI complements radiologists (vs. comparing AI and rad accuracy) are still rare and worth a closer look.

The Diagnostic Study – A team from French telerad provider, IMADIS, analyzed AI and radiologist CTPA interpretations from patients with suspected PE (n = 1,202 patients), finding that:

  • Aidoc PE achieved higher sensitivity (0.926 vs. 0.9 AUCs) and negative predictive value (0.986 vs. 0.981 AUCs)
  • Radiologists achieved higher specificity (0.991 vs. 0.958 AUCs), positive predictive value (0.95 vs. 0.804 AUCs), and accuracy (0.977 vs. 0.953 AUCs)
  • The AI tool flagged 219 suspicious PEs, with 176 true positives, including 19 cases that were missed by radiologists
  • The radiologists detected 180 suspicious PEs, with 171 true positives, including 14 cases that were missed by AI
  • Aidoc PE would have helped IMADIS catch 285 misdiagnosed PE cases in 2020 based on the above AI-only PE detection ratio (19 per 1,202 patients)  

The Radiologist Survey – Nine months after IMADIS implemented Aidoc PE, a survey of its radiologists (n = 79) and a comparison versus its pre-implementation PE CTPAs revealed that:

  • 72% of radiologists believed Aidoc PE improved their diagnostic confidence and comfort 
  • 52% of radiologists the said the AI solution didn’t impact their interpretation times
  • 14% indicated that Aidoc PE reduced interpretation times
  • 34% of radiologists believed the AI tool added time to their workflow
  • The solution actually increased interpretation times by an average of 7.2% (+1:03 minutes) 

The Takeaway

Now that we’re getting better at not obsessing over AI replacing humans, this is a solid example of how AI can complement radiologists by helping them catch more PE cases and make more confident diagnoses. Some radiologists might be concerned with false positives and added interpretation times, but the authors noted that AI’s PE detection advantages (and the risks of missed PEs) outweigh these potential tradeoffs.

The Case for Operational AI

A trio of radiologists from Mount Sinai and East River Medical Imaging starred in a recent Aunt Minnie webinar, discussing their paths towards operational AI adoption, and sharing some very relevant takeaways for radiology groups and AI vendors.

The Cast – The Subtle Medical-sponsored webinar featured Mount Sinai’s Amish H. Doshi, MD and Idoia Corcuera-Solano, MD (neuro and MSK subspecialists) and East River Medical Imaging’s Timothy Deyer, MD (CMIO and MSK IR), all of whom were involved in evaluating and adopting Subtle Medical’s SubtleMR deep learning reconstruction solution.

Make it Easy – When discussing their AI evaluation criteria, the panelists placed a major emphasis on ease-of-evaluation and implementation, with one noting that “before even having a conversation” he’d have to be certain these early processes won’t be costly or cumbersome (clear process, no new hardware, minimal IT work, no up-front purchases, etc.). 

Why Operational AI – Much of the discussion focused on why the panelists support operational AI, noting that scan-shortening DLIR solutions like SubtleMR:

  • Allow more revenue-generating scans per day
  • Alleviate technologist burnout and staffing challenges
  • Improve the patient experience (especially pediatric)
  • Eliminate re-scans by reducing movement artifacts that occur in long exams
  • Don’t require changes to radiologist workflows
  • Maintain diagnostic image quality
  • Receive less pushback from admins and physicians than diagnostic AI

Evaluating SubtleMR for MSK – Mount Sinai’s MSK SubtleMR evaluation process included comparing standard of care and SubtleMR-enhanced abbreviated MRI exams from 50-consecutive knee MR patients. They found that SubtleMR cut scan times by 50% (13:27 to 6:45), while achieving comparable image quality, artifacts, and diagnostic performance.

Evaluating SubtleMR for Neuro – Mount Sinai’s neuro evaluation process involved comparing SubtleMR and conventional MRI with 10-15 patients for each potential MR sequence. They then reviewed the scans with key stakeholders, worked with the Subtle Medical team to make requested imaging adjustments, and implemented the solution.

SubtleMR Results – SubtleMR’s list of benefits (scan speed, patient experience, patient throughput, revenue) earned it approval from all key stakeholders. Although one panelist noted that some of their radiologists critiqued the enhanced images, the radiologist pushback wasn’t nearly as strong as what they’ve seen in response to diagnostic AI products.

The Takeaway

We cover plenty of editorials about what it takes to drive AI adoption, but feedback from real world AI adopters is still rare, making this webinar particularly useful for AI vendors and adopters. The webinar also states a solid case for SubtleMR and other deep learning reconstruction solutions, even for groups who might not be ready to adopt the kind of “AI” that we usually focus on.

MGH’s Multimodal Thyroid Ultrasound AI

An MGH and Harvard Medical team developed a multimodal ultrasound AI platform that applies an interesting mix of AI techniques to accurately detect and stage thyroid cancer, potentially improving diagnosis and treatment planning.

The Platform – The platform combines radiomics, topological data analysis (TDA), ML-based TI-RADS assessments, and deep learning, allowing them to capture more data, minimize noise, and improve prediction accuracy.

The Study – Starting with 1,346 ultrasound images from 784 patients, the researchers trained the multimodal AI platform with 362 nodules (103 malignant) and validated it against a pair of internal (51 malignant, 98 benign) and external (270 malignant, 50 benign) datasets, finding that:

  • The platform predicted 98.7% of internal dataset malignancies (0.99 AUC)
  • The platform predicted 91.4% of external dataset malignancies (0.94 AUC)
  • The individual AI methods were far less accurate (80% to 89% w/ internal)
  • A version of the platform accurately predicted nodal pathological stages (93% for T-stage, 89% for N-stage, 98% for extrathyroidal extension)
  • The platform predicted BRAF mutations with 96% accuracy

Next Steps – The researchers plan to validate their multimodal platform in prospective multicenter clinical trials, including in low-resource countries where it might be particularly helpful.


The Takeaway

We cover plenty of ultrasound AI and thyroid cancer imaging studies, but this team’s multi-AI approach is unique and appears promising. A multimodal AI platform like this might make thyroid cancer diagnosis more efficient and less subjective, avoid unnecessary biopsies, allow non-invasive staging and mutation assessment, and lead to more personalized treatments. That would be a major accomplishment, and might suggest that similar multimodal AI platforms could be developed for other cancers and imaging modalities.

Radiology’s AI ROI Mismatch

A thought-provoking JACR editorial by Emory’s Hari Trivedi MD suggests that AI’s slow adoption rate has little to do with its quality or clinical benefits, and a lot to do with radiology’s misaligned incentives.

After interviewing 25 clinical and industry leaders, the radiology professor and co-director of Emory’s HITI Lab detailed the following economic mismatches:

  • Private Practices value AI that improves radiologist productivity, allowing them to increase reading volumes without equivalent increases in headcount. That makes triage or productivity-focused AI valuable, but gives them no economic justification to purchase AI that catches incidentals, ensures follow-ups, or reduces unnecessary biopsies.
  • Academic centers or hospitals that own radiology groups have far more to gain from AI products that detect incidental/missed findings and then drive internal admissions, referrals, and procedures. That means their highest-ROI AI solutions often drive revenue outside of the radiology department, while creating more radiologist labor.
  • Community hospital emergency departments value AI that allows them to discharge or treat emergency patients faster, although this often doesn’t economically benefit their radiology departments or partner practices.
  • Payor/provider health systems (e.g. the VA, Intermountain, Kaiser) can be open to a broad range of AI, but they especially value AI that reduces costs by avoiding unnecessary tests or catching early signs of diseases.


The Takeaway

People tend to paint imaging AI with a wide brush (AI is… all good, all bad, a job stealer, or the future) and we’ve seen a similar approach to AI adoption barrier editorials (AI just needs… trust, reimbursements, integration, better accuracy, or the killer app). However, even if each of these adoption barriers are solved, it’s hard to see how AI could achieve widespread adoption if the groups paying for AI aren’t economically benefiting from it.

Because of that, Dr. Trivedi encourages vendors to develop AI that provides “returns” to the same groups that make the “investments.” That might mean that few AI products achieve widespread adoption on their own, but a diverse group of specialized AI products achieve widespread use across all radiology settings.

Sirona Medical Acquires Nines AI, Talent

Sirona Medical announced its acquisition of Nines’ AI assets and personnel, representing notable milestones for Sirona’s integrated RadOS platform and the quickly-changing imaging AI landscape.

Acquisition Details – Sirona acquired Nines’ AI portfolio (data pipeline, ML engines, workflow/analytics tools, AI models) and key team members (CRO, Direct of Product, AI engineers), while Nines’ teleradiology practice was reportedly absorbed by one of its telerad customers. Terms of the acquisition weren’t disclosed, although this wasn’t a traditional acquisition considering that Sirona and Nines had the same VC investor.

Sirona’s Nines Strategy – Sirona’s mission is to streamline radiologists’ overly-siloed workflows with its RadOS radiology operating system (unifies: worklist, viewer, reporting, AI, etc.), and it’s a safe bet that any acquisition or investment Sirona makes is intended to advance this mission. With that…

  • Nine’s most tangible contributions to Sirona’s strategy are its FDA-cleared AI models: NinesMeasure (chest CT-based lung nodule measurements) and NinesAI Emergent Triage (head CT-based intracranial hemorrhage and mass effect triage). The AI models will be integrated into the RadOS platform, bolstering Sirona’s strategy to allow truly-integrated AI workflows. 
  • Nine’s personnel might have the most immediate impact at Sirona, given the value/scarcity of experienced imaging software engineers and the fact that Nines’ product team arguably has more hands-on experience with radiologist workflows than any other imaging AI firm (at least AI firms available for acquisition).
  • Nine’s other AI and imaging workflow assets should also help support Sirona’s future RadOS and AI development, although it’s harder to assess their impact for now.

The AI Shakeup Angle – This acquisition has largely been covered as another example of 2022’s AI shakeup, which isn’t too surprising given how active this year has been (MaxQ’s shutdown, RadNet’s Aidence/Quantib acquisitions, IBM shedding Watson Health). However, Nines’ strategy to combine a telerad practice with in-house AI development was quite unique and its decision to sell might say more about its specific business model (at its scale) than it does about the overall AI market.

The Takeaway

Since the day Sirona emerged from stealth, it’s done a masterful job articulating its mission to solve radiology’s workflow problems by unifying its IT infrastructure. Acquiring Nines’ AI assets certainly supports Sirona’s unified platform messaging, while giving it more technology and personnel resources to try to turn that message into a reality.

Meanwhile, Nines becomes the latest of surely many imaging AI startups to be acquired, pivoted, or shut down, as AI adoption evolves at a slower pace than some VC runways. Nines’ strategy was really interesting, they had some big-name founders and advisors, and now their work and teams will live on through Sirona.

CheXstray Drift Detection

A Stanford AIMI and Microsoft Healthcare team just took a step towards addressing imaging AI’s looming drift problem, unveiling their CheXstray drift detection system.

Imaging AI’s Drift Problem – The list of FDA-cleared imaging AI products continues to grow and we’re getting better at AI deployment. However, there’s no reasonable way to monitor how imaging AI models adapt to their constantly changing data environments (tech, vendors, protocols, patient & disease mix, etc.) or whether the models change on their own.

The CheXstray Solution – The team used a pair of public CXR datasets (n = 224k & 160k CXRs) to train/test the CheXstray solution to automatically detect drift by calculating a range of multi-modal inputs (DICOM metadata, image appearance, clinical workflows) and model performance. 

CheXstray Results – Initial experiments showed that the automated CheXstray workflows rivaled ground truth audits for drift detection, essentially achieving the workflow’s proof-of-concept goal. 

Automation Alternatives – Until we have automated monitoring solutions like CheXstray, AI vendors and radiology departments might have to rely on ongoing audits (requiring test set curation, labeling, analytics, etc.) and/or asking radiologists to provide ongoing model feedback. Unfortunately, those options undermine AI’s intended labor-reducing value proposition. Plus, radiologists have already made it quite clear that they don’t think monitoring should be their responsibility (and regulators might agree).

The Takeaway
We haven’t solved imaging AI’s drift monitoring problem yet, and there will be other hurdles to overcome before we see a solution like this achieve clinical adoption (more research, regulatory changes, new modalities, training without massive public datasets). Still, the CheXstray team just showed how imaging AI performance could be automatically monitored in real-time. That’s an important step in imaging AI’s evolution, and it might prove to be critical as more hospitals head into the 2nd or 3rd years after their “successful” AI deployments.

Intracranial Hemorrhage AI Efficiency

A new Radiology: Artificial Intelligence study out of Switzerland highlighted how Aidoc’s Intracranial Hemorrhage AI solution improved emergency department workflows, without hurting patient care. Even if that’s exactly what solutions like this are supposed to do, real world AI studies that go beyond sensitivity and specificity are still rare and worth some extra attention.

The Study – The researchers analyzed University Hospital of Basel’s non-contrast CT intracranial hemorrhage (ICH) exams before and after adopting the Aidoc ICH solution (n = 1,433 before & 3,017 after; ~14% ICH incidence w/ both groups).

Diagnostic Results – The Aidoc solution produced “practicable” overall diagnostic results (93% accuracy, 87.2% sensitivity, 93.9% specificity, and 97.8% NPV), although accuracy was lower with certain ICH subtypes (e.g. subdural hemorrhage 69.2%, 74/107). 

Efficiency Results – More notably, the Aidoc ICH solution “positively impacted” UBS’ ED workflows, with improvements across a range of key metrics:

  • Communicating critical findings: 63 vs. 70 minutes
  • Communicating acute ICH: 58 vs. 73 minutes
  • Overall turnaround time to rule out ICH: 164 vs. 175 minutes
  • Turnaround time to rule out ICH during working hours: 167 vs. 205 minutes

Next Steps – The authors called for further efforts to streamline their stroke workflows and to create a clear ICH AI framework, accurately noting that “AI tools are only as reliable as the environment they are deployed in.”

The Takeaway
The internet hasn’t always been kind to emergency AI tools, and academic studies have rarely focused on the workflow efficiency outcomes that many radiologists and emergency teams care about. That’s not the case with this study, which did a good job showing the diagnostic and workflow upsides of ICH AI adoption, and added a nice reminder that imaging teams share responsibility for AI outcomes.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like..

Select All

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You're all set!