Scaling Machine Learning to the Internet

Sunday, September 18, 2011
7:00 PM
Free and open to the public

In this talk I will give an overview over an array of highly scalable techniques for both observed and latent variable models. This makes them well suited for problems such as classification, recommendation systems, topic modeling and user profiling. I will present algorithms for batch and online distributed convex optimization to deal with large amounts of data, and hashing to address the issue of parameter storage for personalization and collaborative filtering. Furthermore, to deal with latent variable models I will discuss distributed sampling algorithms capable of dealing with tens of billions of latent variables on a cluster of 1000 machines. The algorithms described are used for personalization, spam filtering, recommendation, document analysis, and advertising.

x x


Alex Smola

Principal Research Scientist
Yahoo! Research