The explosion of clinical data provides an exciting new opportunity to use machine learning to discover new and impactful clinical information. Among the questions that can be addressed are establishing the value of treatments and interventions in heterogeneous patient populations, creating risk stratification for clinical endpoints, and investigating the benefit of specific practices or behaviors. However, there are many challenges to overcome. First, clinical data are noisy, sparse, and irregularly sampled. Second, many clinical endpoints (e.g., the time of disease onset) are ambiguous, resulting in ill-defined prediction targets.
In this talk, I will discuss the need for practical, evidence-based medicine, and the challenges of creating multi-modal representations for prediction targets varying both spatially and temporally. I will present recent work that addresses the learning good representations across clinical applications that deal with missing and noisy data. I will discuss work using the electronic medical records for over 30,000 intensive care patients from the MIMIC-III dataset to predict both mortality and clinical interventions, as well as work from a non-clinical setting that uses non-invasive wearable data to detect harmful voice patterns, and the presence of pathological physiology. To our knowledge, classification results on these task are better than those of previous work. Moreover, the learned representations hold intuitive meaning - as topics inferred from narrative notes, and as latent autoregressive states over vital signs. The learned representations capture higher-level structure and dependencies between multi-modal time series data and multiple time-varying targets.