Development of a Fast and High-Precision Audio Noise Reduction System to Enhance the Accuracy of Emotion Estimation in Practical Applications
Open Access
Article
Conference Proceedings
Authors: Kanji Okazaki, Keiichi Watanuki, Yusuke Osawa
Abstract: Speech-based emotion estimation has diverse applications, including mental health monitoring, human–computer interaction, and communication enhancement. The accurate estimation of emotions from speech is crucial in the detection of psychological stress, which is a growing concern in today’s high-stress societies. However, environmental noise significantly degrades estimation accuracy, and studies focusing on noise reduction specifically optimized for emotion estimation remain scarce. This study evaluated the impact of noise reduction on emotion estimation by comparing traditional signal processing (spectral subtraction, Wiener filtering) with deep learning-based methods (U-Net autoencoder, convolutional autoencoder). The effectiveness of each method is examined under continuous vehicle driving and transient clapping noise. The results indicate that traditional techniques effectively suppress continuous noise but struggle with transient noise, whereas AE-based methods, particularly U-Net autoencoder, significantly enhance the estimation accuracy in complex noise environments. This study underscores the importance of emotion-aware noise reduction and suggests that deep learning-based denoising techniques can significantly improve real-world applications. Future research will focus on further optimizing the AE architectures and integrating them into real-time systems.
Keywords: Speech Emotion Estimation, Noise Reduction, U-Net, Spectral Subtraction
DOI: 10.54941/ahfe1006058
Cite this paper:
Downloads
10
Visits
35