AI made a “selective but meaningful” contribution to radiologist interpretations of CT pulmonary angiography scans for pulmonary embolism. The study, published in Radiology: Artificial Intelligence, offers valuable insights into real-world implementation of AI on a large scale.
One of the major criticisms of AI is that algorithms used in real-world clinical situations don’t perform as well as they do in the controlled environments that vendors use to acquire data for regulatory submissions.
- AI performance can drop off as much as 20 to 30 percentage points for important metrics like sensitivity and specificity.
The new study sought to investigate this phenomenon by analyzing a real-world implementation of Aidoc’s AI algorithm for PE detection.
- Researchers assessed the algorithm’s performance for analyzing CTPA exams across a variety of clinical environments in an integrated health network, including the emergency department and inpatient and outpatient settings.
Scans of 29.5k patients acquired from 2021 to 2023 were included. AI analyzed images in real time, after which exams were interpreted by radiologists who knew the AI findings. Researchers found…
- Radiologists using AI had higher sensitivity than the algorithm on its own (99% vs. 85%).
- Specificity was more or less the same (99.8% vs. 99.5%).
- Agreement between radiologists and AI was high (98%).
- Agreement was higher when AI assessed cases as negative rather than positive (98% vs. 94%).
- Radiologists disagreed with AI in 2.2% of cases. The final determination by a panel of expert thoracic radiologists strongly favored radiologists (89%).
- Of the 3.3k cases positive for PE, 0.81% were detected only by AI – or 26 cases.
In analyzing the results, the researchers characterized AI’s contribution as “selective but meaningful.”
- AI-positive results meant scans might require more scrutiny from radiologists, while an AI-negative call might be supportive – but not definitive – for negative PE.
The Takeaway
The new study of AI for PE detection is a fascinating look at real-world AI deployment. While the sensitivity, specificity, and agreement numbers are interesting, what draws our attention is the 26 PE cases caught only by AI over 18 months of use. That boils down to 26 patients whose clinical condition wasn’t missed, and 26 potential malpractice lawsuits that were never filed.

