AI-CARES: Artificial Intelligence Coaching of Cardiac Arrest Resuscitation

Wednesday, May 20, 2026 3:15 PM to 4:50 PM · 1 hr. 35 min. (America/New_York)

L504 - L505: Level L

Innovations-SAEM

Informatics/Data Science/AI

Information

Intro/Background

Despite advances in resuscitation science, cardiac arrest (CA) survival remains low, partly due to inconsistent team leadership and declining skills without frequent training. Simulation-based education improves outcomes but is limited by access to resources and expert facilitators. Artificial intelligence (AI) offers a scalable solution by providing standardized, expert-level assessment and feedback from simulation recordings. This study evaluates AI’s ability to function as a simulation expert to enhance CA training and support improved resuscitation performance.

Purpose/Objective

This study aims to evaluate whether Gemini 2.5 Pro AI can accurately assess technical and non-technical performance in cardiac arrest simulation videos and provide actionable feedback comparable to human simulation experts. We compare AI and expert evaluations of task completion, timing of interventions, leadership, and communication using validated metrics. Learner perceptions of AI-generated versus human feedback are also assessed to determine acceptability, bias reduction, and potential for scalable resuscitation training.

Methods

We designed ten cardiac arrest simulation cases modeled after AHA PALS courses. The cases are run in-situ in the emergency department with participant multidisciplinary teams composed of attendings, residents, nurses, and technicians. Simulations are video recorded and reviewed by fellowship-trained, blinded simulation experts. Standardized technical and leadership metrics are assessed using PALS benchmarks and the validated CALM tool. AI-generated evaluations and feedback are grounded in current guidelines and compared directly with expert human assessments.

Outcomes

Preliminary testing supports feasibility and accuracy. Gemini 2.5 Pro AI analyzed cardiac arrest simulation recordings and demonstrated high concordance with human reviewers. The AI identified key technical event timestamps within a mean of 1.3 seconds and achieved 100% agreement for categorical measures, including compressions, airway, and defibrillator use. Non-technical performance scoring using the CALM tool showed moderate agreement.

Summary

Cardiac arrest (CA) remains a major public health challenge, with persistently low survival rates despite advances in resuscitation science. High-quality team performance, effective leadership, and frequent reinforced training are critical determinants of outcomes. Simulation-based education is an evidence-based strategy for improving technical and non-technical resuscitation skills; however, widespread implementation is limited by access to resources and shortages of trained simulation faculty. As a result, most clinicians rely primarily on periodic American Heart Association (AHA) certification courses, which are associated with skill decay and inconsistent educational benefit. Recent advances in artificial intelligence (AI), particularly large language models (LLMs) capable of multimodal video and audio analysis, present a promising solution to these limitations. LLMs can rapidly process simulation recordings, extract time-stamped performance metrics, assess teamwork and communication, and provide standardized, unbiased feedback at scale. This study evaluates whether Gemini 2.5 Pro AI can function as a simulation expert for CA education by accurately assessing both technical and non-technical performance and delivering actionable feedback comparable to human experts. We hypothesize that Gemini 2.5 Pro AI can analyze CA simulation recordings with accuracy equivalent to fellowship-trained simulation experts while reducing bias and improving scalability. The primary goal is to determine concordance between AI-generated and human expert assessments of technical skills, including time to critical interventions such as airway placement, epinephrine administration, and vascular access. Secondary objectives include comparing assessments of leadership and teamwork using the validated Concise Assessment of Leader Management (CALM) tool and evaluating learner perceptions of AI-generated versus human-provided feedback. The study involves ten in-situ cardiac arrest simulation cases modeled after the AHA PALS cases, performed by emergency medicine teams composed of attendings, residents, nursing and technical staff. Simulations are video recorded and independently reviewed by both Gemini 2.5 Pro AI and blinded human simulation experts. AI analysis is “grounded” using in-context learning: the complete PALS AHA guidelines and the CALM scoring rubric are embedded directly within the AI prompt, constraining the model’s evaluations to current gold-standard references. AI outputs include structured timestamps for technical events and quantitative scores for non-technical domains. Statistical analysis focuses on agreement between AI and human expert ratings of non-technical skills (CALM) and time to critical tasks during pediatric CA management was assessed using intraclass correlation coefficients (ICC), with Bland–Altman plots used for visualization. Preliminary testing supports feasibility and accuracy. In pilot analyses of two recorded simulations, Gemini 2.5 Pro AI identified key technical event timestamps with a mean difference of 1.3 seconds compared to human reviewers and achieved 100% concordance for categorical measures such as airway intervention, compressions, intravenous access, and defibrillator use. Non-technical CALM scores demonstrated moderate agreement with human experts, supporting further investigation. Learner perceptions of feedback are assessed via post-simulation surveys. This study has the potential to demonstrate that AI can reliably and safely evaluate CA simulation performance, addressing major barriers to simulation-based education. If successful, AI-driven assessment could enable frequent, standardized, and scalable resuscitation training across diverse healthcare settings, including resource-limited and rural institutions, ultimately improving resuscitation quality and patient outcomes.

CME

1.5