The default successful paradigm of machine learning is supervised learning. Here humans are hired (often through mechanical turk these days) to label items, and then the labeled information is fed into a learning algorithm to create a learned predictor. A more natural, less expensive and accurate approach is observe what works in practice, and use this information to learn a predictor.
For people familiar with the first approach, there are several failure modes ranging from plain inconsistency to mere suboptimality. A core basic issue here is the need for exploration---because if a choice is not explored, we can't optimize for it. The need for exploration implies a need for using exploration to evaluate learned solutions, to guide the optimization of learned predictors, and the need to control the process of exploration so as to accomplish it efficiently.