Can Artificial Intelligence Mimic the Human Touch? Evaluating the Quality of Artificial Intelligence–Written Discharge Instructions Using a Deidentified Dataset

Thursday, May 21, 2026 12:24 PM to 12:32 PM · 8 min. (America/New_York)

International Hall 8: Level I

Abstracts

Informatics/Data Science/AI

Information

Abstract Number

358

Background and Objectives

The popularization of generative artificial intelligence (AI) has prompted implementation of AI tools in clinical care. Discharge instructions (DIs) are an excellent application given the difficulty of writing personalized DIs. However, these tools need stress testing to ensure they are noninferior to handwritten DIs with few or no omissions or hallucinations prior to deployment.

Methods

We randomly selected 100 inpatient charts from the MIMIC-IV database. Three were excluded due to inpatient mortality. AI DIs were created using each chart’s discharge summary with a single-shot prompt agreed upon by the study investigators, while the original handwritten DIs were extracted. 194 DIs were evaluated by 8 ED clinicians using a rubric based on best-practice guidelines. Review domains included quality of lab and imaging explanations, incidental findings discussions, presence of hallucinations or omissions, and whether reviewers believed the DI was AI or provider written. Quantitative data was analyzed with ANOVA analyses, with DI identification accuracy assessed using chi-square analysis. Free-text error descriptions were independently reviewed by 2 investigators to identify themes.

Results

A total of 771 evaluations of 194 DIs were completed. Reviewers correctly identified 92.3% of AI DIs and 84.4% of handwritten DIs. When DIs were labeled as AI (n=419), truly AI DIs received higher overall quality ratings than provider DIs (4.74 vs 3.35, p<0.00001), as well as higher ratings for lab explanations (4.52 vs 3.88, p=0.00075), imaging explanations (4.62 vs 4.03, p=0.00011), and incidental findings (4.63 vs 2.42, p<0.00001). Errors occurred in 11.99% of AI DIs vs 19.16% of provider DIs. Provider DIs errors were flagged often for omissions, while AI DIs errors were due to over-generalization or excessive certainty. Hallucinations were seen in 6% of charts; they were typically specific factual insertions (labs, vitals, imaging), not global narrative errors.

Conclusion

AI DIs were rated higher in overall quality and clarity than provider DIs, with less omissions but a tendency toward over-generalization or excessive certainty regarding diagnosis. These findings suggest that AI DIs may be noninferior to handwritten DIs, and may improve DI thoroughness and clarity. However, clinician oversight is required to minimize errors and ensure DIs explain the diagnosis with the correct degree of certainty.

CME

0.75