

Preprint 44/2013
Learning Binaural Spectrogram Features for Azimuthal Speaker Localization
Wiktor Młynarski
Contact the author: Please use for correspondence this email.
Submission date: 20. Apr. 2013 (revised version: May 2013)
Pages: 6
Bibtex
Keywords and phrases: Machine Learning, machine hearing, sparse coding
Download full preprint: PDF (447 kB)
Abstract:
Spatial localization of speech and other natural sounds with
rich spectro-temporal structure is a computationally
challenging task. It requires extraction of features which are
informative about speaker's position and yet invariant to
sound level and spectral modulation present in the signal. This
paper demonstrates, that this can be achieved with
Independent Component Analysis (ICA) applied to binaural
speech spectrograms. A small subset of learned Independent
Components (ICs) captures signal structure imposed by outer
ears. A Gaussian Classifier trained on those features, performs
accurate localization on the azimuthal plane. The remaining
majority of ICs have position invariant distributions, and can
be used to reconstruct the spectrogram of the original sound
source.