Real-World Accuracy and Clinician Usage of Artificial Intelligence–Flagged Actionable Incidental Findings From Emergency Department Radiology Reports

Tuesday, May 19, 2026 4:36 PM to 4:48 PM · 12 min. (America/New_York)

International Hall 7: Level I

Abstracts

Informatics/Data Science/AI

Information

Number

Background and Objectives

Actionable incidental findings (AIFs) are common in emergency department (ED) imaging but are inconsistently disclosed and documented, contributing to missed follow-up. Prior research suggests large language models can accurately identify AIFs from ED radiology reports. This study evaluated clinician-verified accuracy, uptake, and disclosure documentation associated with GPT-4o–flagged AIFs from ED radiology reports during real-world routine ED care.

Methods

From July 10 to September 28, 2025, GPT-4o reviewed radiology report text at five EDs to identify AIFs and recommended follow-up. Outputs were then re-presented via the electronic health record (EHR) to ED clinicians at inpatient handoff or ED discharge to optionally review and adjudicate. Clinicians could mark outputs as accurate or discordant, and attest to disclosure documentation. The primary outcome was clinician accuracy response (accurate vs discordant) among reviewed prompts; secondary outcomes were interaction rate and disclosure documentation. We retrospectively reviewed all flagged cases and report descriptive statistics (counts and proportions).

Results

GPT flagged ≥1 AIF in 4,972 ED visits (~61/day across all sites). In 62.5% (3018) of visits, the clinical team did not interact with the AIF outputs. Of the 37.5% (1864) of visits with clinician review, 99.25% (1850) were marked as accurate; 0.75% (14) were marked as discordant. Manual review of discordant cases revealed that 11/14 (78.6%) were technically accurate. Among clinician-verified accurate outputs, 89.5% (1655/1850) resulted in real-time AIF disclosure documentation; clinicians elected to copy the AI outputs verbatim 84.1% of the time (1392/1655) and used free-text editing 15.9% of cases (263/1655).

Conclusion

In real-world ED practice embedded in the EHR, nearly all (99.3%) GPT-flagged AIFs were evaluated by clinicians as accurate, and frequently led to contemporaneous documentation (89.5%). These findings support feasibility of EHR-integrated AI to improve disclosure documentation of AIFs, a known gap in ED transitions of care. However, many prompts (62.5%) were not reviewed, suggesting that uptake, not clinician-verified accuracy upon review, is a primary barrier to realizing benefit at scale. Future work will test implementation strategies to increase engagement and evaluate downstream outcomes such as follow-up completion.

CPE

CME

1.25