Dysphonic - Objective differentiation of dysphonic voice quality types (FWF-KLI722-B30)
- Period
- 2018 — 2024
- Funding
- Österreichischer Wissenschaftsfonds (FWF), Österreich
- Partners
- Philipp Aichinger (Department of Otorhinolaryngology, Medical University of Vienna)
- Research Areas
- Contact
Verbal communication is one of the most significant human achievements, relying on the functioning of the voice box, particularly the vibration of the vocal cords. This vibration gives the voice its tone, similar to how a vibrating string gives a guitar its sound. Voice disorders may disrupt this normal vibration, making speaking difficult. Clinicians use cameras to examine patient’s vocal folds and listening to the nuances of the voice. However, vocal cords vibrate rapidly, making them difficult to see, and both visual and auditory assessments can be subjective. This project aimed to address these challenges through innovative technology and methods. The innovative techniques and significant findings are the following. First, researchers used high-speed cameras to slow down voice recordings by a factor of 160. This allowed for observation of minute details otherwise missed, especially features of irregularity. Second, microphone recordings were analyzed to understand how vocal sounds relate to vibrations. This led to improved understanding of how different vocal fold conditions affect voice sound. Third, the project involved advanced simulations of vocal fold vibrations and the auditory process, further pinpointing critical features of phonatory dysfunctions. Finally, the project leveraged artificial intelligence and machine learning (AIML). In particular, recent advances in speech technology (cf. Siri, Alexa, etc.) were adapted to create more realistic simulations of pathological voices, and AIML mimicking human vision was used to automate video analysis, reducing the need for manual review and facilitating the implementation of slow-motion video analysis in clinical settings. Specific voice types were investigated. First, diplophonia is a condition where different regions of the vocal folds vibrate at distinct rates, causing a doubled voice. Software was developed to measure the frequency of this occurrence in speech objectively. Second, vocal fry and creaky voice are characterized by separated sound pulses, similar to the sounds of a frying pan, a creaky door, or the making of popcorn. This research clarified that such voices either have a low vibration rate, or other disturbances creating only the illusion of pulse separation. Third, researchers investigated timing differences between vocal fold regions and extra pulses similar to extra systoles in heartbeats. In summary, this research has greatly enhanced our understanding of vocal fold mechanics and voice perception. By combining high-speed video technology, computer simulations, and AI, the project tackles key challenges in diagnosing and treating voice disorders. The findings have the potential to transform clinical practices, providing more accurate and reliable diagnostics via digital twinning and decision support, ultimately leading to better treatment outcomes and an improved quality of life for individuals with voice problems.