AI Models Go Head-to-Head in Project AIR Study

One of the biggest challenges in assessing the performance of different AI algorithms is the varying conditions under which AI research studies are conducted. A new study from the Netherlands published this week in Radiology aims to correct that by testing a variety of AI algorithms head-to-head under similar conditions.

There are over 200 AI algorithms on the European market (and even more in the US), many of which address the same clinical condition.

Therefore, hospitals looking to acquire AI can find it difficult to assess the diagnostic performance of different models.

The Project AIR initiative was launched to fill the gap in accurate assessment of AI algorithms by creating a Consumer Reports-style testing environment that’s consistent and transparent.

Project AIR researchers have assembled a validated database of medical images for different clinical applications, against which multiple AI algorithms can be tested; to ensure generalizability, images have come from different institutions and were acquired on equipment from different vendors.

In the first test of the Project AIR concept, a team led by Kicky van Leeuwen of Radboud University Medical Centre in the Netherlands invited AI developers to participate, with nine products from eight vendors validated from June 2022 to January 2023: two models for bone age prediction and seven algorithms for lung nodule assessment (one vendor participated in both tests). Results included:

For bone age analysis, both of the tested algorithms (Visiana and Vuno) showed “excellent correlation” with the reference standard, with an r correlation coefficient of 0.987-0.989 (1 = perfect agreement)
For lung nodule analysis, there was a wider spread in AUC between the algorithms and human readers, with humans posting a mean AUC of 0.81
Researchers found superior performance for Annalise.ai (0.90), Lunit (0.93), Milvue (0.86), and Oxipit (0.88)

What’s next on Project AIR’s testing agenda? Van Leeuwen told The Imaging Wire that the next study will involve fracture detection. Meanwhile, interested parties can follow along on leaderboards for both bone age and lung nodule use cases.

The Takeaway

Head-to-head studies like the one conducted by Project AIR may make many AI developers squirm (several that were invited declined to participate), but they are a necessary step toward building clinician confidence in the performance of AI algorithms that needs to take place to support the widespread adoption of AI.

Get every issue of The Imaging Wire, delivered right to your inbox.

You might also like

FDA AI Approvals Surge Past 1k for Radiology December 11, 2025

RSNA 2025 Video Highlights December 9, 2025

Risks of Rising Contrast Use December 8, 2025