Sparse Pulsed Auditory Representations For Speech and Audio Coding

home › phd theses › Sparse Pulsed Auditory Representations For Speech and Audio Coding

Sparse Pulsed Auditory Representations For Speech and Audio Coding

Status

Finished

Date

2005-10-03

Student

Christian Feldbauer

Mentor

Gernot Kubin

Research Areas

Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features most relevant to the human listener for coding applications. This thesis deals with the approach of `coding in the perceptual domain’ and is based on an invertible auditory model that provides a pulsed auditory representation of the input speech or audio signal. It is natural for pulsed signal representations to encode only the non-zero samples by specifying their positions as side information. For the considered auditory representation, the number of pulses and, therefore, the amount of side information is too high for an efficient encoding at a relatively low bit rate.

The focus of this work is to sparsify' the pulsed signal representation, i.e., to remove its perceptual irrelevance and its redundancy, to obtain a compact signal representation, which facilitates efficient encoding and from which the signal can nevertheless be reconstructed with perceptually transparent quality. For this purpose, thetransmultiplexer view’ of perceptual-domain coding is proposed, which leads to a new masking model. This masking model is successfully applied to obtain a sparse pulsed signal representation. Experiments show that the proposed sparse signal representation is able to hide a remarkable amount of reconstruction errors. We discuss approaches to efficiently encode sparse pulsed signal representations. We also deal with computationally efficient implementation methods for auditory filterbanks, which are key components of virtually all auditory models.