Detection Results of Spectro-temporal Fragment-based Multiband Position-Pitch (MPoPi) Algorithm
With increasingly powerful and affordable computational resources for digital signal processing and growing use of sensor arrays, acoustic source localization has become an interesting area of research. In contrast to traditional localization applications such as radar and sonar, speech source localization introduces additional challenges due to the wideband and non-stationary nature of speech signals, due to the unknown trajectories of the speakers and due to the effects of multipath propagation in enclosures.
In our work, we make use of fundamental frequency or pitch information of speech signals in addition to the location . Our “position-pitch”-based algorithm pre-processes the speech signals by a multiband gammatone filterbank that is inspired from the auditory model of the human inner ear. Moreover, our method incorporates the study of the human neural system use of correlations between adjacent sub-band frequencies and grouping of spectro-temporal regions formed by fundamental frequency cues. The algorithm is able to localize multiple concurrent moving speakers as shown in the figure. The speakers start at well-separated positions and move towards each other in a cross-over scenario. The markers only represent the direction-of-arrival (DoA) estimates and are not assigned to any particular speaker in these plots. The scenario was recorded using 24-channel uniform circular microphone array in our lab’s meeting room.