A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Thomas Bryan; Veton Kepuska; Ivica Kostanic

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32807

A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30d B SNR as a reference for voice activity.

Keywords: Atomic Decomposition, Gabor, Gammatone, Matching Pursuit, Voice Activity Detection.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1106389

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750

References:

[1] Gabor, D., Theory of communication, J. Inst. Elect. Eng., 93, pp. 429– 457. 1946
[2] Lobo, A., Loizou, P., Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition. ICASSP (1) 2003: 820-823
[3] Smith, E., Lewicki, M., Efficient auditory coding. Nature, 439(7079):978–82, 2006.
[4] R. Patterson I. Nimmo-Smith. An Efficient Auditory Filterbank Based on the Gammatone Function. Institute of Acoustics on Auditory Modelling 1987
[5] Slaney, M., (1998) "Auditory Toolbox Version 2", Technical Report #1998-010, Interval Research Corporation, 1998.
[6] Atlas, L. Decomposition of speech and sound into Modulations and Carriers. http://msrvideo.vo.msecnd.net/rmcvideos/173320/dl/ 173320.pdf, Microsoft Research & University of Washington. 2012
[7] Mallat, S., Zhang, Z., Matching Pursuits with Time-Frequency Dictionaries. IEEE transactions on signal processing, Vol 41. No 12, 1993
[8] Kressner, A., Anderson, D., Rozell, C. Causal Binary Mask Estimation for Speech Enhancements using Sparsity Constraints. Proceedings on Meetings on Acoustics Vol. 9, 055037 2013
[9] Guo, D., Verdu’, S., Mutual Information and Minimum Mean-Square Error in Gaussian Channels. IEEE transactions on information theory, Vol. 51, No. 4, 2005
[10] Eargle, J., Handbook of Recording Engineering. 4th Addition. Springer Science and Business Media. ISBN 1-4020-7230-9 (HC), 2003.