Patients who arrive at the ED with acute chest pain (ACP) syndrome end up receiving a series of often-negative tests, but a new MGB-led study suggests that CXR AI might make ACP triage more accurate and efficient.
The researchers trained three ACP triage models using data from 23k MGH patients to predict acute coronary syndrome, pulmonary embolism, aortic dissection, and all-cause mortality within 30 days.
- Model 1: Patient age and sex
- Model 2: Patient age, sex, and troponin or D-dimer positivity
- Model 3: CXR AI predictions plus Model 2
In internal testing with 5.7k MGH patients, Model 3 predicted which patients would experience any of the ACP outcomes far more accurately than Models 2 and 1 (AUCs: 0.85 vs. 0.76 vs. 0.62), while maintaining performance across patient demographic groups.
- At a 99% sensitivity threshold, Model 3 would have allowed 14% of the patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).
In external validation with 22.8k Brigham and Women’s patients, poor AI generalizability caused Model 3’s performance to drop dramatically, while Models 2 and 1 maintained their performance (AUCs: 0.77 vs. 0.76 vs. 0.64). However, fine-tuning with BWH’s own images significantly improved the performance of the CXR AI model (from 0.67 to 0.74 AUCs) and Model 3 (from 0.77 to 0.81 AUCs).
- At a 99% sensitivity threshold, the fine-tuned Model 3 would have allowed 8% of BWH patients to skip additional cardiovascular or pulmonary testing (vs. Model 2’s 2%).
Acute chest pain is among the most common reasons for ED visits, but it’s also a major driver of wasted ED time and resources. Considering that most ACP patients undergo CXR exams early in the triage process, this proof-of-concept study suggests that adding CXR AI could improve ACP diagnosis and significantly reduce downstream testing.
A new European Radiology study detailed a commercial CXR AI tool’s challenges when used for screening patients with low disease prevalence, bringing more attention to the mismatch between how some AI tools are trained and how they’re applied in the real world.
The researchers used an unnamed commercial AI tool to detect abnormalities in 3k screening CXRs sourced from two healthcare centers (2.2% w/ clinically significant lesions), and had four radiology residents read the same CXRs with and without AI assistance, finding that the AI:
- Produced a far lower AUROC than in its other studies (0.648 vs. 0.77–0.99)
- Achieved 94.2% specificity, but just 35.3% sensitivity
- Detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumors
- Only “modestly” improved the residents’ AUROCs (0.571–0.688 vs. 0.534–0.676)
- Added 2.96 to 10.27 seconds to the residents’ average CXR reading times
The researchers attributed the AI tool’s “poorer than expected” performance to differences between the data used in its initial training and validation (high disease prevalence) and the study’s clinical setting (high-volume, low-prevalence, screening).
- More notably, the authors pointed to these results as evidence that many commercial AI products “may not directly translate to real-world practice,” urging providers facing this kind of training mismatch to retrain their AI or change their thresholds, and calling for more rigorous AI testing and trials.
These results also inspired lively online discussions. Some commenters cited the study as proof of the problems caused by training AI with augmented datasets, while others contended that the AI tool’s AUROC still rivaled the residents and its “decent” specificity is promising for screening use.
We cover plenty of studies about AI generalizability, but most have explored bias due to patient geography and demographics, rather than disease prevalence mismatches. Even if AI vendors and researchers are already aware of this issue, AI users and study authors might not be, placing more emphasis on how vendors position their AI products for different use cases (or how they train it).
Cathay Life Insurance will use Lunit’s INSIGHT CXR AI solution to identify abnormalities in its applicants’ chest X-rays, potentially modernizing a manual underwriting process and uncovering a new non-clinical market for AI vendors.
Lunit INSIGHT CXR will be integrated into Cathay’s underwriting workflow, with the goals of enhancing its radiologists’ accuracy and efficiency, while improving Cathay’s underwriting decisions.
Lunit and Cathay have reason to be optimistic about this endeavor, given that their initial proof of concept study found that INSIGHT CXR:
- Improved Cathay’s radiologists’ reading accuracy by 20%
- Reduced the radiologists’ overall reading time by up to 90%
Those improvements could have a significant labor impact, considering that Cathay’s rads review 30,000 CXRs every year. They might have an even greater business impact, noting the important role that underwriting accuracy has on policy profitability.
Lunit’s part of the announcement largely focused on its expansion beyond clinical settings, revealing plans to “become the driving force of digital innovation in the global insurance market” and to further expand its business into “various sectors outside the hospital setting.”
Even if life insurers only require CXRs for a small percentage of their applicants (older people, higher value policies), they still review hundreds of thousands of CXRs each year. That makes insurers an intriguing new market segment for AI vendors, and makes you wonder what other non-clinical AI use cases might exist. However, it might also make radiologists who are still skeptical about AI concerned.