Generative Artificial Intelligence Using Ambient Artificial Intelligence Technology to Deliver Feedback

Wednesday, May 20, 2026 3:15 PM to 4:50 PM · 1 hr. 35 min. (America/New_York)

L504 - L505: Level L

Innovations-SAEM

Informatics/Data Science/AI

Information

Intro/Background

Generative artificial intelligence (GenAI) has woven into healthcare, while being noticeably nascent in medical education. Decreasing academic physician workforce due to salary, institutional expectations et.al. perpetuates innovation drought and complacency in guiding and coaching trainees, whose numbers are only rising, placing a larger burden on existing academicians. Augmenting current labor-intensive feedback delivery with GenAI models are hitherto unexplored territory. This exploratory phase of a pilot study attempts to evaluate its feasibility.

Purpose/Objective

To evaluate the feasibility of developing a GenAI feedback model using a standard framework which includes ease of protocol implementation, process functionality, data evaluation include mixed method outcome measurements that assesses its quality and quantity and conceptual/psychometric adequacy assessments. We also highlight limitations and pertinent points related to responsible use of AI in healthcare. The model utilizes AI-augmented ambient audio transcripts during the clinical shifts of a faculty pediatric emergency medicine (PEM) physician.

Methods

Observational feasibility pilot used AI-augmented HIPAA-compliant ambient audio to generate transcripts of clinical shifts. Two GenAI feedback models were iteratively created using context and prompt engineering (CPE) best practices. Feedback summaries of four clinical shifts, nine hours each, were then generated. Qualitative and quantitative statistical (mixed-methods) analysis was performed using MAXQDA and IBM-SPSS respectively. Leveraging pre-defined ACGME core competencies for PEM/Clinical Educators facilitated pattern recognition among segments within the summaries and they were analyzed.

Outcomes

With straightforward implementation and practical processes, data was collected over four nine-hour shifts, each its own discreet transcript. Thematic analysis using nine core-competency "themes" plus "strengths" and "areas of improvement" revealed similarities pertaining to the physician's positive attributes, differences in the specific examples provided and some differences in areas of improvement, suggested next steps/actionable modifications. The models appeared to take different pedagogical approaches. Reliability analyses with Intra-class coefficient 0.69-0.77 show good positive correlation.

Summary

We are presenting this exploratory phase of the pilot study because of the lack of prior data regarding this particular use case of GenAI. The feasibility data revealed: 1. Protocol Implementation into workflow was straightforward. We met with no resistance from patients/families. 2. An automated workflow ran over four nine-hour shifts to create audio transcripts that then generated written transcripts without identifiers. Two models (Versions A and B) were iteratively created using CPE basics and best practices. 3. Mixed-methods analysis: relevant qualitative and quantitative measures of feedback delivered (ACGME competencies) and critical appraisal of the models (feedback specificity, tone, actionable behaviors etc.) The findings demonstrate the model’s ability to consistently identify performance patterns across nine distinct competencies, plus both, strengths and areas of improvement. Notably, the models were not specifically taught the core concepts of medical education and teaching. Instead, they utilized an agentic version of a large language GenAI model, with CPE facilitating the responses. The prompt for Version A was more general compared to Version B. They certainly delivered a large volume of feedback, with some that allowed for clinical introspection on the part of the physician. While the granular details differed, the feedback delivered had objectivity, consistency, detail, specificity, clarity and relative timeliness. There are several limitations: 1. Selection bias (single physician pilot)- it was intentional to understand its actual utility and impact (part of the full pilot data) prior to any learner-facing trials. 2. Not on "observed" behaviors, but more "heard" or "transcripted" behaviors lacking tonal context. Both, corrective and reinforced feedback was delivered (the former required more iterative enhancement in CPE on both). 3. The quality overall needs to be ascertained at a larger scale along with its scalability, generalizability and several other metrics. It lacks the ability to provide nuanced feedback that is rich in context. For this, the need for strict human interpretation of AI-generated feedback by a trusted mentor/coach is essential. 4. It additionally highlights the growing need for AI literacy and education across the board along with understanding and a strong critical appraisal of all GenAI models prior to real-life implementation, similar to medical devices and technologies. 5. Most importantly, human oversight is required to protect our trainees, who are increasingly utilizing AI in healthcare without appropriate education/training, both from the unknown effects of utilizing AI without responsibly knowing how to do so, and from the potential psychological effects of having an AI "feedback coach" without heavy scrutiny. This is particularly relevant given the declining number of academic physicians and the increasing number of trainees, which creates a gap exacerbated by generational and technological divides. Trainees yearn for targeted, specific feedback, but with the shortage of human mentors/coaches to provide it, they will naturally seek one from AI. Our study emphasizes that AI should complement, not replace, feedback in medical education. While the promise of transforming professional development, precision medical education, quality improvement, physician satisfaction and patient care may exist, we must strive to not allow it to become a cautionary tale instead.

CME

1.5

Disclosures

Access the following link to view disclosures of session presenters, presenting authors, organizers, moderators, and planners:

https://docs.google.com/document/d/1n3BR4vq244cHR73In9vyx0z8AEfTk9Zx/edit?usp=sharing&ouid=115946363091129784870&rtpof=true&sd=true

Speakers

Neehar Kundurti

MDTulane University, Louisiana State University HSC and Manning Family Children’s New Orleans

Innovations

Oral Innovations Session IV: AI and Large Language Models (LLMs) in Medical Education

Wednesday, May 20, 2026 3:15 PM to 4:50 PM