In many sequential decision problems all that we have is a record of historical trajectories. Building dynamic models from these trajectories and ultimately sequential decision policies may result in much uncertainty and bias. In this talk we consider the question of how to create control policies from existing historical data and how to better sample trajectories so that future control policies would be better. This question has been central in reinforcement learning in the last decade if not more, and involves methods from statistics, optimization, and control theory.
We will focus on one the possible remedies to uncertainty in sequential decision problems: using risk measures such as the conditional value-at-risk as the objective to be optimized rather than the ubiquitous expected reward. We consider the complexity and efficiency of evaluating and optimizing risk measures. Our main theme is that considering risk is essential to obtain resilience to model uncertainty and model mismatch.
We will then describe two challenging real-world domains that have been studied in our research group in collaboration with experts from industry and academia: diabetes care management in healthcare and asset management in high-voltage transmission grids. For each domain we will describe our efforts to reduce the problem to its bare essentials as a reinforcement learning problem, the algorithms for learning the control policies, and some of the lessons we learned.