Neural Compression of Medical History Data: A Strategy to Reduce Big Data Burden in Emergency Medicine

Wednesday, May 20, 2026 4:08 PM to 4:16 PM · 8 min. (America/New_York)

International Hall 9: Level I

Abstracts

Informatics/Data Science/AI

Information

Abstract Number

657

Background and Objectives

The digitalization of healthcare generates large volumes of clinical data, offering substantial research potential but at the cost of increased computational burden. The incorporation of past medical history exemplifies this challenge: traditional dichotomous encoding produces high-dimensional, sparse matrices that are computationally expensive to process. Dimensionality reduction techniques may mitigate this issue, yet remain underexplored in emergency medicine. This study aimed to develop a neural autoencoder to compress medical history data into a reduced latent space suitable for predictive modeling.

Methods

We conducted a retrospective cohort study at the Centre hospitalier de l’Université de Montréal, a tertiary adult academic emergency department (~70,000 visits annually) in Montréal, Canada. Adult patients presenting with abdominal pain, chest pain, or dyspnea between 2017 and 2023 were included. A total of 663 free-text medical history variables were cleaned using standard natural language processing techniques and encoded as multi-hot vectors. A neural autoencoder was trained using binary cross-entropy loss, dropout regularization, and early stopping, with hyperparameters optimized via Bayesian search. Model performance was assessed using mean reconstruction error. Computational efficiency was evaluated by comparing training times of predictive models using raw versus encoded medical history data. The latent space structure was explored using t-distributed stochastic neighbor embedding (t-SNE). Including over 33,150 patients yielded more than 50 observations per input variable, supporting stable model training.

Results

A total of 33,917 patients were included (mean age: 46 years; female: 54%). The autoencoder reduced dimensionality from 663 variables to 32 latent features, achieving a mean squared reconstruction error below 0.005. Use of latent variables resulted in a 3.46-fold reduction in computational training time. t-SNE projection revealed clinically coherent clustering, notably by physiological systems.

Conclusion

Neural autoencoding enables efficient and robust dimensionality reduction of medical history data while preserving clinically meaningful structure. This approach reduces computational cost, training time, and carbon footprint, and generates latent representations that are well suited for predictive modeling.

CME

1.25