More Support for CT Lung Cancer Screening

Yet another study supporting CT lung cancer screening has been published, adding to a growing body of evidence that population-based CT screening programs will be effective in reducing lung cancer deaths. 

The new study comes from European Radiology, where researchers from Hungary describe findings from HUNCHEST-II, a population-based program that screened 4.2k high-risk people at 18 institutions. 

  • Screening criteria were largely similar to other studies: people between the ages of 50 and 75 who were current or former smokers with at least 25 pack-year histories. Former smokers had quit within the last 15 years. 

Recruitment for HUNCHEST-II took place from September 2019 to January 2022. Participants received a baseline low-dose CT (LDCT) scan, with the study protocol calling for annual follow-up scans (more on this later). Researchers found: 

  • The prevalence of baseline screening exams positive for lung cancer was 4.1%, comparable to the NELSON trial (2.3%) but much lower than the NLST (27%)
  • 1.8% of participants were diagnosed with lung cancer throughout screening rounds
  • 1.5% of participants had their cancer found with the baseline exam
  • Positive predictive value was 58%, at the high end of population-based lung screening programs
  • 79% of screen-detected cancers were early stage, making them well-suited for treatment
  • False-positive rate was 42%, a figure the authors said was “concerning”

Taking a deeper dive into the data produces interesting revelations. Overdiagnosis is a major concern with any screening test; it was a particular problem with NLST but was lower with HUNCHEST-II. 

  • Researchers said they used a volume-based nodule evaluation protocol, which reduced the false-positive rate compared to the nodule diameter-based approach in NLST.

Also, a high attrition rate occurred between the baseline scan and annual screening rounds, with only 12% of individuals with negative baseline LDCT results going on to follow-up screening (although the COVID-19 pandemic may have affected these results). 

The Takeaway

The HUNCHEST-II results add to the growing momentum in favor of national population-based CT lung screening programs. Germany is planning to implement a program in early 2024, and Taiwan is moving in the same direction. The question is, does the US need to step up its game as screening compliance rates remain low?

Unpacking the Biden Administration’s New AI Order

It seems like watershed moments in AI are happening on a weekly basis now. This time, the big news is the Biden Administration’s sweeping executive order that directs federal regulation of AI across multiple industries – including healthcare. 

The order comes as AI is becoming a clinical reality for many applications. 

  • The number of AI algorithms cleared by the FDA has been surging, and clinicians – particularly radiologists – are getting access to new tools on an almost daily basis.

But AI’s rapid growth – and in particular the rise of generative AI technologies like ChatGPT – have raised questions about its future impact on patient care and whether the FDA’s existing regulatory structure is suitable for such a new technology. 

The executive order appears to be an effort to get ahead of these trends. When it comes to healthcare, its major elements are summarized in a succinct analysis of the plan by Health Law Advisor. In short, the order: 

  • Calls on HHS to work with the VA and Department of Defense to create an HHS task force on AI within 90 days
  • Requires the task force to develop a strategic plan within a year that could include regulatory action regarding the deployment and use of AI for applications such as healthcare delivery, research, and drug and device safety
  • Orders HHS to develop a strategy within 180 days to determine if AI-enabled technologies in healthcare “maintain appropriate levels of quality” – basically, a review of the FDA’s authorization process
  • Requires HHS to set up an AI safety program within a year, in conjunction with patient safety organizations
  • Tells HHS to develop a strategy for regulating AI in drug development

Most analysts are viewing the executive order as the Biden Administration’s attempt to manage both risk and opportunity. 

  • The risk is that AI developers lose control of the technology, with consequences such as patients potentially harmed by inaccurate AI. The opportunity is for the US to become a leader in AI development by developing a long-term AI strategy. 

The Takeaway

The question is whether an industry that’s as fast-moving as AI – with headlines changing by the week – will lend itself to the sort of centralized long-term planning envisioned in the Biden Administration’s executive order. Time will tell.

AI Tug of War Continues

The ongoing tug of war over AI’s value to radiology continues. This time the rope has moved in AI’s favor with publication of a new study in JAMA Network Open that shows the potential of a new type of AI language model for creating radiology reports.

  • Headlines about AI have ping-ponged in recent weeks, from positive studies like MASAI and PERFORMS to more equivocal trials like a chest X-ray study in Radiology and news from the UK that healthcare authorities may not be ready for chest X-ray AI’s full clinical roll-out. 

In the new paper, Northwestern University researchers tested a chest X-ray AI algorithm they developed with a transformer technique, a type of generative AI language model that can both analyze images and generate radiology text as output. 

  • Transformer language models show promise due to their ability to combine both image and non-image data, as researchers showed in a paper last week.

The Northwestern researchers tested their transformer model in 500 chest radiographs of patients evaluated overnight in the emergency department from January 2022 to January 2023. 

Reports generated by AI were then compared to reports from a teleradiologist as well as the final report by an in-house radiologist, which was set as the gold standard. The researchers found that AI-generated reports …

  • Had sensitivity a bit lower than teleradiology reports (85% vs. 92%)
  • Had specificity a bit higher (99% vs. 97%)
  • In some cases improved on the in-house radiology report by detecting subtle abnormalities missed by the radiologist

Generative AI language models like the Northwestern algorithm could perform better than algorithms that rely on a classification approach to predicting the presence of pathology. Such models limit medical diagnoses to yes/no predictions that may omit context that’s relevant to clinical care, the researchers believe. 

In real-world clinical use, the Northwestern team thinks their model could assist emergency physicians in circumstances where in-house radiologists or teleradiologists aren’t immediately available, helping triage emergent cases.

The Takeaway

After the negative headlines of the last few weeks, it’s good to see positive news about AI again. Although the current study is relatively small and much larger trials are needed, the Northwestern research has promising implications for the future of transformer-based AI language models in radiology.

CT Lung Screening Saves Women

October may be Breast Cancer Awareness Month, but a new study has great news for women when it comes to another life-threatening disease: lung cancer. 

Italian researchers in Lung Cancer found that CT lung cancer screening delivered survival benefits that were particularly dramatic for women – and could address cardiovascular disease as well. 

  • They found that in addition to much higher survival rates, women who got CT lung screening after 12 years of follow-up had lower all-cause mortality than men. 

Of all the cancer screening tests, lung screening is the new kid on the block.

  • Although randomized clinical trials have shown it to deliver lung cancer mortality benefits of 20% and higher, uptake of lung screening has been relatively slow compared to other tests.

In the current study, researchers from the Fondazione IRCCS Istituto Nazionale dei Tumori in Milan analyzed data from 6.5k heavy smokers in the MILD and BioMILD trials who got low-dose CT screening from 2005 to 2016. 

In addition to cancer incidence and mortality, they also used Coreline Soft’s AVIEW software to calculate coronary artery calcium (CAC) scores acquired with the screening exams to see if they predicted lung cancer mortality. Researchers found that after 12 years of follow-up …

  • There was no statistically significant difference in lung cancer incidence between women and men (4.4% vs. 4.7%)
  • But women had lower lung cancer mortality than men (1% vs. 1.9%) as well as lower all-cause mortality (4.1% vs. 7.7%), both statistically significant
  • Women had higher lung cancer survival than men (72% vs. 52%)
  • 15% of participants had CAC scores between 101-400, and all-cause mortality increased with higher scores
  • Women had lower CAC scores, which could play a role in lower all-cause mortality due to less cardiovascular disease

The Takeaway

This is a fascinating study on several levels. First, it shows that lung cancer screening produces a statistically significant decline in all-cause mortality for women compared to men.

Second, it shows that CT lung cancer screening can also serve as a screening test for cardiovascular disease, helping direct those with high CAC scores to treatment such as statin therapy. This type of opportunistic screening could change the cost-benefit dynamic when it comes to analyzing lung screening’s value – especially for women.

More Work Ahead for Chest X-Ray AI?

In another blow to radiology AI, the UK’s national technology assessment agency issued an equivocal report on AI for chest X-ray, stating that more research is needed before the technology can enter routine clinical use.

The report came from the National Institute for Health and Care Excellence (NICE), which assesses new health technologies that have the potential to address unmet NHS needs. 

The NHS sees AI as a potential solution to its challenge of meeting rising demand for imaging services, a dynamic that’s leading to long wait times for exams

But at least some corners of the UK health establishment have concerns about whether AI for chest X-ray is ready for prime time. 

  • The NICE report states that – despite the unmet need for quicker chest X-ray reporting – there is insufficient evidence to support the technology, and as such it’s not possible to assess its clinical and cost benefits. And it said there is “no evidence” on the accuracy of AI-assisted clinician review compared to clinicians working alone.

As such, the use of AI for chest X-ray in the NHS should be limited to research, with the following additional recommendations …

  • Centers already using AI software to review chest X-rays may continue to do so, but only as part of an evaluation framework and alongside clinician review
  • Purchase of chest X-ray AI software should be made through corporate, research, or non-core NHS funding
  • More research is needed on AI’s impact on a number of outcomes, such as CT referrals, healthcare costs and resource use, review and reporting time, and diagnostic accuracy when used alongside clinician review

The NICE report listed 14 commercially available chest X-ray algorithms that need more research, and it recommended prospective studies to address gaps in evidence. AI developers will be responsible for performing these studies.

The Takeaway

Taken with last week’s disappointing news on AI for radiology, the NICE report is a wakeup call for what had been one of the most promising clinical use cases for AI. The NHS had been seen as a leader in spearheading clinical adoption of AI; for chest X-ray, clinicians in the UK may have to wait just a bit longer.

AI Hits Speed Bumps

There’s no question AI is the future of radiology. But AI’s drive to widespread clinical use is going to hit some speed bumps along the way.

This week is a case in point. Two studies were published showing AI’s limitations and underscoring the challenges faced in making AI an everyday clinical reality. 

In the first study, researchers found that radiologists outperformed four commercially available AI algorithms for analyzing chest X-rays (Annalise.ai, Milvue, Oxipit, and Siemens Healthineers) in a study of 2k patients in Radiology.

Researchers from Denmark found the AI tools had moderate to high sensitivity for three detection tasks: 

  1. airspace disease (72%-91%)
  2. pneumothorax (63%-90%)
  3. pleural effusion (62%-95%). 

But the algorithms also had higher false-positive rates and performance dropped in cases with smaller pathology and multiple findings. The findings are disappointing, especially since they got such widespread play in the mainstream media

But this week’s second study also brought worrisome news, this time in Radiology: Artificial Intelligence about an AI training method called foundation models that many hope holds the key to better algorithms. 

Foundation models are designed to address the challenge of finding enough high-quality data for AI training. Most algorithms are trained with actual de-identified clinical data that have been labeled and referenced to ground truth; foundation models are AI neural networks pre-trained with broad, unlabeled data and then fine-tuned with smaller volumes of more detailed data to perform specific tasks.

Researchers in the new study found that a chest X-ray algorithm trained on a foundation model with 800k images had lower performance than an algorithm trained with the CheXpert reference model in a group of 42.9k patients. The foundation model’s performance lagged for four possible results – no finding, pleural effusion, cardiomegaly, and pneumothorax – as follows…

  • Lower by 6.8-7.7% in females for the “no finding” result
  • Down by 10.7-11.6% in Black patients in detecting pleural effusion
  • Lower performance across all groups for classifying cardiomegaly

The decline in female and Black patients is particularly concerning given recent studies on bias and lack of generalizability for AI.  

The Takeaway

This week’s studies show that there’s not always going to be a clear road ahead for AI in its drive to routine clinical use. The study on foundation models in particular could have ramifications for AI developers looking for a shortcut to faster algorithm development. They may want to slow their roll. 

Predicting AI Performance

How can you predict whether an AI algorithm will fall short for a particular clinical use case such as detecting cancer? Researchers in Radiology took a crack at this conundrum by developing what they call an “uncertainty quantification” metric to predict when an AI algorithm might be less accurate. 

AI is rapidly moving into wider clinical use, with a number of exciting studies published in just the last few months showing how AI can help radiologists interpret screening mammograms or direct which women should get supplemental breast MRI

But AI isn’t infallible. And unlike a human radiologist who might be less confident in a particular diagnosis, an AI algorithm doesn’t have a built-in hedging mechanism.

So researchers from Denmark and the Netherlands decided to build one. They took publicly available AI algorithms and tweaked their code so they produced “uncertainty quantification” scores with their predictions. 

They then tested how well the scores predicted AI performance in a dataset of 13k images for three common tasks covering some of the deadliest types of cancer:

1) detecting pancreatic ductal adenocarcinoma on CT
2) detecting clinically significant prostate cancer on MRI
3) predicting pulmonary nodule malignancy on low-dose CT 

Researchers classified the highest 80% of the AI predictions as “certain,” and the remaining 20% as “uncertain,” and compared AI’s accuracy in both groups, finding … 

  • AI led to significant accuracy improvements in the “certain” group for pancreatic cancer (80% vs. 59%), prostate cancer (90% vs. 63%), and pulmonary nodule malignancy prediction (80% vs. 51%)
  • AI accuracy was comparable to clinicians when its predictions were “certain” (80% vs. 78%, P=0.07), but much worse when “uncertain” (50% vs. 68%, P<0.001)
  • Using AI to triage “uncertain” cases produced overall accuracy improvements for pancreatic and prostate cancer (+5%) and lung nodule malignancy prediction (+6%) compared to a no-triage scenario

How would uncertainty quantification be used in clinical practice? It could play a triage role, deprioritizing radiologist review of easier cases while helping them focus on more challenging studies. It’s a concept similar to the MASAI study of mammography AI.

The Takeaway

Like MASAI, the new findings present exciting new possibilities for AI implementation. They also present a framework within which AI can be implemented more safely by alerting clinicians to cases in which AI’s analysis might fall short – and enabling humans to step in and pick up the slack.  

Are Doctors Overpaid?

A new study on physician salaries is raising pointed questions about pay for US physicians and whether it contributes to rising healthcare costs – that is, if you believe the numbers are accurate. 

The study was released in July by the National Bureau of Economic Research (NBER), which produces in-depth reports on a variety of topics. 

The current paper is highly technical and may have languished in obscurity were it not for an August 4 article in The Washington Post that examined the findings with the claim that “doctors make more than anyone thought.”

It is indeed true that the NBER’s estimate of physician salaries seems high. The study claims US physicians made an average of $350k in 2017, the year that the researchers focused on by analyzing federal tax records. 

  • The NBER estimate is far higher than $294k in Medscape’s 2017 report on physician compensation – a 19% difference. 

The variation is even greater for diagnostic radiologists. The NBER data claim radiologists had a median annual salary in 2017 of $546k – 38% higher than the $396k average salary listed in Medscape’s 2017 report. 

  • The NBER numbers from six years ago are even higher than 2022/2023 numbers for radiologist salaries in several recent reports, by Medscape ($483k), Doximity ($504k), and Radiology Business ($482k). 

But the NBER researchers claim that by analyzing tax data rather than relying on self-reported earnings, their data are more accurate than previous studies, which they believe underestimate physician salaries by as much as 25%. 

  • They also estimate that physician salaries make up about 9% of total US healthcare costs.

What difference is it how much physicians make? The WaPo story sparked a debate with 6.1k comments so far, with many readers accusing doctors of contributing to runaway healthcare costs in the US.

  • Meanwhile, a thread in the AuntMinnie forums argued whether the NBER numbers were accurate, with some posters warning that the figures could lead to additional cuts in Medicare payments for radiologists. 

The Takeaway

Lost in the debate over the NBER report is its finding that physician pay makes up only 9% of US healthcare costs. In a medical system that’s rife with overutilization, administrative costs, and duplicated effort across fragmented healthcare networks, physician salaries should be the last target for those who actually want to cut healthcare spending. 

Grading AI Report Quality

One of the most exciting new use cases for medical AI is in generating radiology reports. But how can you tell whether the quality of a report generated by an AI algorithm is comparable to that of a radiologist?

In a new study in Patterns, researchers propose a technical framework for automatically grading the output of AI-generated radiology reports, with the ultimate goal of producing AI-generated reports that are indistinguishable from those of radiologists. 

Most radiology AI applications so far have focused on developing algorithms to identify individual pathologies on imaging exams. 

  • While this is useful, helping radiologists streamline the production of their main output – the radiology report – could have a far greater impact on their productivity and efficiency. 

But existing tools for measuring the quality of AI-generated narrative reports are limited and don’t match up well with radiologists’ evaluations. 

  • To improve that situation, the researchers applied several existing automated metrics for analyzing report quality and compared them to the scores of radiologists, seeking to better understand AI’s weaknesses. 

Not surprisingly, the automated metrics fell short in several ways, including false prediction of findings, omitting findings, and incorrectly locating and predicting the severity of findings. 

  • These shortcomings point out the need for better scoring systems for gauging AI performance. 

The researchers therefore proposed a new metric for grading AI-generated report quality, called RadGraph F1, and a new methodology, RadCliQ, to predict how well an AI report would measure up to radiologist scrutiny. 

  • RadGraph F1 and RadCliQ could be used in future research on AI-generated radiology reports, and to that end the researchers have made the code for both metrics available as open source.

Ultimately, the researchers see the construction of generalist medical AI models that could perform multiple complex tasks, such as conversing with radiologists and physicians about medical images. 

  • Another use case could be applications that are able to explain imaging findings to patients in everyday language. 

The Takeaway

It’s a complex and detailed paper, but the new study is important because it outlines the metrics that can be used to teach machines how to generate better radiology reports. Given the imperative to improve radiologist productivity in the face of rising imaging volume and workforce shortages, this could be one more step on the quest for the Holy Grail of AI in radiology.

How COVID Crashed CT Scanners in China

In the early days of the COVID-19 pandemic in China, hospitals were performing so many lung scans of infected patients that CT scanners were crashing. That’s according to an article based on an interview with a Wuhan radiologist that provides a chilling first-hand account of radiology’s role in what’s become the biggest public health crisis of the 21st century.

The interview was originally published in 2022 by the Chinese-language investigative website Caixin and was translated and published this month by U.S. Right to Know, a public health advocacy organization. 

In a sign of the information’s sensitivity, the original publication on Caixin’s website has been deleted, but U.S. Right to Know obtained the document from the US State Department under the Freedom of Information Act. 

Radiologists at a Wuhan hospital noticed how COVID cases began doubling every 3-4 days in early January 2020, the article states, with many patients showing signs of ground-glass opacities on CT lung scans – a telltale sign of COVID infection. But Chinese authorities suppressed news about the rapid spread of the virus, and by January 11 the official estimate was that there were only 41 COVID cases in the entire country.

In reality, COVID cases were growing rapidly. CT machines began crashing in the fourth week of January due to overheating, said the radiologist, who estimated the number of cases in Wuhan at 10,000 by January 21. Hospitals were forced to turn infected patients away, and many people were so sick they were unable to climb onto X-ray tables for exams. Other details included: 

  • Chinese regulatory authorities denied that human-to-human transmission of the SARS CoV-2 virus was occurring even as healthcare workers began falling ill
  • Many workers at Chinese hospitals were discouraged from wearing masks in the pandemic’s early days to maintain the charade that human-to-human contact was not possible – and many ended up contracting the virus
  • Radiologists and other physicians lived in fear of retaliation if they spoke up about the virus’ rapid spread

The Takeaway

The article provides a stunning behind-the-scenes look at the early days of a pandemic that would go on to reshape the world in 2020. What’s more, it demonstrates the vital role of radiology as a front-line service that’s key to the early identification and treatment of disease – even in the face of bureaucratic barriers to delivering quality care.

Get every issue of The Imaging Wire, delivered right to your inbox.

You're signed up!

It's great to have you as a reader. Check your inbox for a welcome email.

-- The Imaging Wire team

You might also like..

Select All

You're all set!