Sparse Pulsed Auditory Representations For Speech and Audio Coding
- Status
- Finished
- Student
- Christian Feldbauer
- Mentor
- Gernot Kubin
- Research Areas
Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features most relevant to the human listener for coding applications. This thesis deals with the approach of `coding in the perceptual domain’ and is based on an invertible auditory model that provides a pulsed auditory representation of the input speech or audio signal. It is natural for pulsed signal representations to encode only the non-zero samples by specifying their positions as side information. For the considered auditory representation, the number of pulses and, therefore, the amount of side information is too high for an efficient encoding at a relatively low bit rate.
The focus of this work is to sparsify' the pulsed signal representation, i.e., to
remove its perceptual irrelevance and its redundancy, to obtain a compact signal
representation, which facilitates efficient encoding and from which the signal can
nevertheless be reconstructed with perceptually transparent quality. For this purpose,
the
transmultiplexer view’ of perceptual-domain coding is proposed, which
leads to a new masking model. This masking model is successfully applied to obtain
a sparse pulsed signal representation. Experiments show that the proposed
sparse signal representation is able to hide a remarkable amount of reconstruction
errors. We discuss approaches to efficiently encode sparse pulsed signal representations. We also deal with computationally efficient implementation methods for
auditory filterbanks, which are key components of virtually all auditory models.