Evaluating the Safety and Impact of Large Language Model Spanish Translations of Emergency Department Discharge Instructions

Thursday, May 21, 2026 8:24 AM to 8:32 AM · 8 min. (America/New_York)

M101: Level M

Abstracts

Health Equity & Disparities

Information

Abstract Number

680

Background and Objectives

Large language models (LLMs) have the potential to help close the language equity gap in emergency departments (ED) by facilitating translation of discharge instructions to patient-preferred languages. In this pilot, we evaluated the safety and impact of LLM-translated instructions among Spanish speakers.

Methods

We performed a prospective, two-phase, non-randomized cohort study of Spanish-preferring ED patients aged >18 years. The control phase utilized routine discharge instructions from the treating physician. In the intervention phase, patients received routine instructions plus real-time text translations generated by protected health information (PHI)-compliant Claude Sonnet 3.5 using a previously validated prompt. All translations were reviewed for clinically significant errors by a Spanish-speaking physician within 24 hours. The primary outcome was the frequency of clinically significant translation errors. The secondary outcome was differences in patient comprehension evaluated within 8 days post-discharge along 7 domains (diagnosis, testing, ED treatment, prescriptions, medical advice, physician follow-up, and return precautions), each using a 4-point Likert scale. Mann-Whitney U testing assessed differences between groups.

Results

65 patients were enrolled with a median age of 56, 40 (61.5%) were female. 43 patients were part of the intervention group and received machine translations of their instructions. One (2.3%) translation was flagged as having a clinically significant error due to an ambiguous translation of the term “driving” (“conducción”). Follow up interviews were completed for 13/22 (59.0%) of control patients and 30/43 (67.0%) of pilot patients. In both the control and intervention phases, median scores for most domains (with the exception of return precautions) were high, ranging between 3.5 and 4 without significant differences between groups. Return precaution comprehension scores were the lowest but increased in the intervention group as compared to the control group (median 2 [IQR 2–3] vs 1 [1–2], p=0.009).

Conclusion

We show LLMs have a promising safety profile for English-to-Spanish discharge instruction translation, but human verification is still required. Providing translated instructions did not increase comprehension across most discharge instruction domains.

CME

0.75