JSALT 2015 -- Week 2 Plenary
Friday, July 17:
- 10-11am Plenary seminar [EEB 105]
- 11:30am-1pm Team reports & lunch with speaker [EEB 303]
Deep Multimodal Learning
Ruslan Salakhutdinov, Department of Computer Science and Department of Statistics, University of Toronto
n this talk, I will describe a class of statistical models that are capable of extracting a unified representation that fuses together multiple data modalities. In particular, inspired by recent advances in machine translation, I will introduce an encoder-decoder model that learns a multimodal joint embedding space of images and text. The encoder can be used to rank images and sentences while the decoder can generate novel descriptions of images from scratch. I will further describe a novel approach to unsupervised learning of a generic, distributed sentence encoder and show that on several tasks, including semantic relatedness, paraphrase detection, image-sentence ranking, these models improve upon many of the existing techniques. Finally, I will present a model that can learn to classify previously unseen image categories solely based on their textual descriptions (e.g. Wikipedia articles).
Ruslan Salakhutdinov received his PhD in machine learning (computer science) from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Department of Computer Science and Department of Statistics. Dr. Salakhutdinov's primary interests lie in statistical machine learning, deep learning, probabilistic graphical models, and large-scale optimization. He is the recipient of the Early Researcher Award, Connaught New Researcher Award, Alfred P. Sloan Research Fellowship, Microsoft Research Faculty Fellowship, Google Faculty Research Award, and a Fellow of the Canadian Institute for Advanced Research