

Can Artificial Intelligence Mimic the Human Touch? Evaluating the Quality of Artificial Intelligence–Written Discharge Instructions Using a Deidentified Dataset
Thursday, May 21, 2026 12:24 PM to 12:32 PM · 8 min. (America/New_York)
International Hall 8: Level I
Abstracts
Informatics/Data Science/AI
Information
Abstract Number
358
Background and Objectives
The popularization of generative artificial intelligence (AI) has prompted implementation of AI tools in clinical care. Discharge instructions (DIs) are an excellent application given the difficulty of writing personalized DIs. However, these tools need stress testing to ensure they are noninferior to handwritten DIs with few or no omissions or hallucinations prior to deployment.
Methods
We randomly selected 100 inpatient charts from the MIMIC-IV database. Three were excluded due to inpatient mortality. AI DIs were created using each chart’s discharge summary with a single-shot prompt agreed upon by the study investigators, while the original handwritten DIs were extracted. 194 DIs were evaluated by 8 ED clinicians using a rubric based on best-practice guidelines. Review domains included quality of lab and imaging explanations, incidental findings discussions, presence of hallucinations or omissions, and whether reviewers believed the DI was AI or provider written. Quantitative data was analyzed with ANOVA analyses, with DI identification accuracy assessed using chi-square analysis. Free-text error descriptions were independently reviewed by 2 investigators to identify themes.
Results
A total of 771 evaluations of 194 DIs were completed. Reviewers correctly identified 92.3% of AI DIs and 84.4% of handwritten DIs. When DIs were labeled as AI (n=419), truly AI DIs received higher overall quality ratings than provider DIs (4.74 vs 3.35, p<0.00001), as well as higher ratings for lab explanations (4.52 vs 3.88, p=0.00075), imaging explanations (4.62 vs 4.03, p=0.00011), and incidental findings (4.63 vs 2.42, p<0.00001). Errors occurred in 11.99% of AI DIs vs 19.16% of provider DIs. Provider DIs errors were flagged often for omissions, while AI DIs errors were due to over-generalization or excessive certainty. Hallucinations were seen in 6% of charts; they were typically specific factual insertions (labs, vitals, imaging), not global narrative errors.
Conclusion
AI DIs were rated higher in overall quality and clarity than provider DIs, with less omissions but a tendency toward over-generalization or excessive certainty regarding diagnosis. These findings suggest that AI DIs may be noninferior to handwritten DIs, and may improve DI thoroughness and clarity. However, clinician oversight is required to minimize errors and ensure DIs explain the diagnosis with the correct degree of certainty.
CME
0.75
Disclosures
Access the following link to view disclosures of session presenters, presenting authors, organizers, moderators, and planners:

