Data Augmentation for Time-Domain Music Separation

home › theses & projects › Data Augmentation for Time-Domain Music Separation

Data Augmentation for Time-Domain Music Separation

Status

Finished

Type

Master Thesis

Announcement date

11 Dec 2023

Student

Nico Mittendrein

Mentors

Franz Pernkopf

Research Areas

Intelligent Systems

Source separation has been a challenging task for classical signal processing. Introducing deep learning to this problem has yielded promising results. This work focuses on music source separation, which divides an existing music piece into its underlying instruments. Modern architectures utilizing multi-domain input, as well as large-scale neural networks, have shown a performance matching the human ear. One of the most used dataset for this task is MUSDB18, which consists of recordings of human musicians, which is ideal for real-world use cases. However, the amount of data in this dataset is limited. Many works have shown, that supplying additional data yields better results on final scores. The goal of this work is to find data augmentations to obtain maximum performance with this limited dataset, as it is not only of good quality, but also a common benchmark. The well-working model DemucsV2 is chosen as a baseline, which operates in the time-domain. To improve its performance, different data augmentations are applied on the dataset. Three candidate

data augmentation methods, which are AugMix, DeepAugment and RandomForestAugment, were chosen, adapted properly, and evaluated. For determining the performance of the

data augmentations, the previously chosen baseline model is fine-tuned with the additional input of the candidate data augmentation methods. Afterwards, the model is applied on the test set of MUSDB18, and metrics, that were already used for comparing different methods in the Signal Separation Evaluation Campaign 2018 (SiSEC18), are calculated to evaluate the best of the three approaches. Further, the impact of different context lengths to the model is analyzed,

to further improve the quality of separation. Varying the input context did not massively increase the performance of the overall method. When comparing the different data augmentation approaches, AugMix was found unsuitable for

this task. The best data augmentation strategy within the candidate methods according to the results of this work is RandomForestAugment. The resulting model of this work was able to outperform the currently best time-domain method, that utilizes the same dataset. However, the same architecture has shown to reach even better scores when supplied with additional data.