Applied Mathematics, Ecole Polytechnique
Scattering Invariants for Audio Classification
February 26, 2014, 10:30am - 11:20am
EE Building Room 303
Host: Professor Les Atlas, Bloedel Research Scholar, Department of Electrical Engineering
To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. The commonly used Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.
Joakim Andén is a Ph.D. candidate in applied mathematics at Ecole Polytechnique in Paris, France under the supervision of Prof. Stéphane Mallat. Previously, he studied engineering physics and mathematics at the Royal Institute of Technology in Stockholm, Sweden and fundamental mathematics at Université Pierre et Marie Curie in Paris, France, from which he received an M.Sc. in 2010. His research focuses on invariant signal representations and their applications to classification and similarity estimation for speech, music and environmental sounds as well as medical signals.