Signal Processing and Speech Communication Laboratory
hometheses & projects › sound classification of YouTube videos with Recurrent Neural Networks

sound classification of YouTube videos with Recurrent Neural Networks

Bachelor Project
Announcement date
01 Oct 2018
Christian Walter
Research Areas


The upcoming of big datasets from companies like Google or YouTube makes it easy to train Neural Network s (NNs) for classification tasks of images, sounds and videos. Videos and sounds however contain sequence data, where the dimension time is included. Ordinary feed-forward NNs cannot handle the time dimension and therefore features or outputs must be averaged or pooled over time. Recurrent Neural Networks (RNNs) solve this sequential nature inside their network structure and are able to remember past content. Therefore RNNs become more important for sequential data classification tasks. In this work experiments with different NN architectures from the YouTube-8M baseline, which is a starter code to classify millions of YouTube videos by sounds and images, are performed. Three models will be described and evaluated, a frame-level-logistic-model based on a feed-forward NN and two RNNs, a LSTM- model and a GRU-model. Further an overview of sound classification approaches is given and RNNs are described in detail including their mathematical structure and the training methods used. As restriction in the classification experiments, only sounds and not images of the videos are used.