JSALT 2015

JSALT 2015 Research

JSALT Main Page > Speech & NLP Summer School | Research Groups
Participant Information > Accommodations | Dining & Groceries | Transportation & Tourist Attractions

Research Group: Far-Field Speech Enhancement and Recognition in Mismatched Settings

Based on the recent success of automatic speech recognition (ASR) for mobile applications, noise robustness of ASR in the real world has become an important technical issue. ASR systems will soon be expected to function in a variety of conditions - gaming (Kinect), personal assistants (Amazon Echo), meeting recognition and distance wire-taps, to name a few. Traditional application scenarios tend to utilize the same microphone and channel conditions in training and at test time. Efforts so far have focused on developing techniques, such as microphone arrays, source separation, speech enhancement (SE), and ASR, that work in a given specific setting (microphone geometry, environment, etc.). Such approaches tend to over-fit the system to the training setting, and do not generalize well to mismatched or unseen settings. We propose tackling this challenging problem using cutting-edge machine learning techniques based around three themes: embedding of generative model-based strategies into a deep learning framework using deep unfolding, augmentation of training data, and multi-task learning methodologies.

Senior Team Members:

Shinji Watanabe (Mitsubishi Electric Research Laboratories)
Martin Karafiát (Brno Institute of Technology)
Michael I. Mandel (Ohio State University)
Jon Barker (University of Sheffield)
Team Leader: John R. Hershey (Mitsubishi Electric Research Laboratories)