Neural Network Model for Visualization of Conversational Mood with Four Adjective Pairs
Abstract
In recent years, the accuracy of speech recognition has improved remarkably. Speech recognition software can be used to obtain text information from conversational speech data. Although text can be treated as surface level information, several studies have indicated that speech recognition can also be used to estimate emotions, which represent higher level information in a conversation. Several newly proposed models use LSTM or GRU to estimate emotion in conversations. However, when attempting to monitor or influence conversations conducted as part of a meeting or a chat, the mood of the conversation is more important than the emotion. In normal conversation, emotions such as anger and sadness are unlikely to be explicitly expressed for some purposes, including avoidance of getting into an unexpected argument and offending others. Thus, when attempting to control or monitor the state of a conversation during a meeting or casual discussion, it is often more important to estimate the mood than the emotion. Some researchers have examined the role of mood, as distinguished from emotion, and one called diffuse emotional states that persist over a long period of time "mood" and are usually distinguished based on duration and intensity of expression. However, these differences are rarely quantified, and no specific durations are fixed. Accurate identification of the mood of a conversation is especially important for Japanese people who are engaged in collaborative and democratic decision making. To construct the teacher data for the model designed to estimate the conversational mood, we first selected representative adjective pairs that could describe the conversational mood. We utilized a system developed by Iiba et al. to estimate 21 affective scales of adjective pairs from input text. The 21 adjective pairs were clustered into 4 groups based on the output scales. The 4 adjective pairs to be annotated were representative of the 4 clusters. We expected these 4 adjective pairs (gloomy-happy, easy-serious, calm-aggressive, tidy-messy) to capture the mood of a conversation.Based on the four adjective pairs, we constructed a new training data set containing 60 hours of conversations in Japanese. In this study, the data obtained only by microphones are used for estimation of conversational mood. The data set was annotated by the four adjective scales to learn the mood of the conversations. We de-veloped a LSTM deep neural network model that could read the "conversational mood" in real time. Furthermore, in our proposed neural network model, the amount of laughter which is generally measured by capturing facial expression with camera is also estimated together with the conversational mood. Because laughter is considered to play an important role in creating a cheerful environment, it can be used to evaluate the conversational mood. The evaluation results are shown to present the validity of our model. This model is expected to be applied to a system that can influence or control the mood of conversations in some ways, including presentation of ambient music and aromas, depending on the purpose of the discussion, such as during a conference, chatting, or business meeting.
Keywords: Mood, conversation, deep learning, affective ambient intelligence
DOI: 10.54941/ahfe1004396
Cite this paper
More from this volume
- Automatic Classification of Infant Sleeping Postures Using an Infrared Camera
- Analysis of Stair-Ascent Activities with Handrail Use in Daily Living Space and Motion Features using RGBD Camera
- Body Movement Support System for Prevent Disability and Promote Progress
- Shaping a device for Anti-viral disinfection and checking health of people moving in public space
- Transforming the homecare offering scene: How the technology plays a role
- Improving Comfort of Shoulder and Back Health in Children's School Bags: Examining Damper Shoulder Straps and Ergonomic Factors
- Tiny Titans: Acceptance of In-Vivo Capsule and Micro Robots in Healthcare Innovation
- Early Characterization of Stroke Using Video Analysis and Machine Learning
- Upper trapezius muscle activity pattern at work and associated neck pain - Study protocol for analyses of a pooled EMG data set
- Use of predictive models based on biomedical signals and motion measurements for predicting extremity kinematics
- Feature Selection for Machine Learning-Based Core Body Temperature Estimation Using Hand-Measurable Biological Information
- The Effect of Automated Agents on Individual Performance Under Induced Stress


AHFE Open Access