JSALT 2015 -- Week 3 Plenary
Friday, July 24:
- 10-11am Plenary seminar [EEB 105]
- 11:30am-1pm Team reports & lunch with speaker [EEB 303]
Approaching Human Parity in Understanding Speech,
Vision and Language by Computers
Dr. Xuedong Huang
With impressive advances in machine learning algorithms, big data, and computing infrastructure, we have reached a point where we can realistically target human parity in understanding speech, vision, and language. Progress in these areas can be exemplified by conversational speech recognition for the Switchboard task, Skype Translator, image recognition as benchmarked by ImageNet 1K, image captioning, Cortana and web search. I will use some of these examples to review and discuss our historical efforts, exciting opportunities, and the grand challenge of reaching computer/human parity in speech, vision and language.
Dr. Xuedong Huang is a Distinguished Engineer/Chief Scientist of Speech R&D of Microsoft Corporation. He heads up the Advanced Technology Group in Microsoft Technology and Research. In 1993 Huang joined Microsoft to found the company's speech recognition efforts. As the head of Microsoft's speech over a decade, he provided technical, engineering and business leadership to bring speech recognition to the mass market. His seminal contributions on shared parameter modeling (both tied mixture and Markov states) improved trainability and efficiency – a concept still used in modern speech recognition today. He pioneered to introduce SAPI to Windows since 1995 and helped to ship enterprise-grade Speech Server 2004 that received many technical awards including Speech Technology Magazine's Most Innovative Solutions Award in 2004. He spent five years working on Bing search and ads as the chief architect to improve web search relevance before his current Microsoft responsibility.