We conducted a blinded cross-sectional non-inferiority study of English-language ED discharge
instructions translated into Spanish, Brazilian Portuguese, and Simplified Chinese comparing Google
Translate and ChatGPT-4o versus professional medical interpreters. Fifty-three randomly selected
provider-written instructions (100–500 words, preserving spelling/grammar errors) were translated,
yielding 477 unique translations. Professional medical interpreters, blinded to translation method,
independently scored each translation on fluency, adequacy, meaning, and severity on a five-point
Likert scale. Inter-rater reliability between the professional interpreter evaluations was calculated. A
0.5-point non-inferiority margin was pre-specified, and adjusted mean Likert rating differences
generated by mixed effects models for each accuracy dimension were compared between translation
methods for each language. The proportion of clinically significant translation errors was compared
between methods, as was the ability of evaluators to guess the translation method.