Signal Processing and Speech Communication Laboratory
hometheses & projects › Acoustic Event Detection of General Sounds

Acoustic Event Detection of General Sounds

Master Thesis
Announcement date
01 Oct 2016
Michael Peitler
Research Areas


While CCTV systems are widely used for monitoring public spaces, the information provided by the acoustic domain is often neglected. An acoustic event detection for general sounds can detect security-relevant occurrences like people screaming for help, a gunshot or an explosion by processing the audio signal captured from microphones. In this thesis, acoustic event detection systems are introduced based on the theory of human auditory perception. The techniques for audio signal pre-processing, feature extraction, ma- chine learning and evaluation of the detection systems are explained. The combination and parametrization of these is analyzed with the goal of high event detection rates and a low num- ber of false alarms. Event detection systems found in literature are introduced. A sound library collection is introduced providing training and test data. In the last part, a system is proposed to detect security-relevant events in public spaces. Its objective is to distinguish between a human scream, normal human voice, gunshot, explosion, breakage of glass and the background sound. The feature set can be assembled of MFCCs, MP7 features and Teager Energy Operator based features. Frame-wise evaluation is done with suitable classification measures. Precision, recall, accuracy and the F 1 -score are computed for each class. These measures are averaged for the whole system and the Acoustic Event Error Rate is computed. In the course of the experiments, the best system configuration in terms of classification perfor- mance was found with the combined MP7+TEO feature set, maximum-margin GMMs with 128 components and a frame length of 200ms. This resulted in an F 1 -score of 0.74 and an Acoustic Event Error Rate of 0.24.