Semantic Structure and Importance Extraction from Sequential Conversational Data via Dimensional Reduction

Takeshi Matsuda; Michio Sonoda

doi:10.54941/ahfe1008009

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

Semantic Structure and Importance Extraction from Sequential Conversational Data via Dimensional Reduction

Open Access

Article

Conference Proceedings

Authors: Takeshi Matsuda, Michio Sonoda

Abstract

The analysis of spoken data from panel discussions, policy dialogues, and educational meetings has gained increasing importance in both academic research and professional practice. However, conventional approaches to Japanese conversation analysis have relied heavily on keyword matching or surface‑level text similarity, making it difficult to capture deeper semantic relationships, topic transitions, and latent discourse structures. In addition, Japanese natural language processing pipelines often rely on environment-sensitive morphological analyzers, which hinder reproducibility and large-scale processing. To address these limitations, this study proposes a robust and semantically enriched framework for conversation understanding based on a composite distributed representation. The proposed method integrates three layers of linguistic information: (1) contextual sentence embeddings generated by a multilingual transformer model, (2) word embeddings obtained from fastText, and (3) co‑occurrence vectors that capture lexical association patterns within the conversation. Sudachi is employed for Japanese text preprocessing to ensure stable and reproducible morphological analysis. By combining these components into a unified composite vector, the framework simultaneously represents global sentence‑level meaning and local lexical relationships. Using this representation, a directed graph is constructed that incorporates both temporal adjacency and semantic proximity between utterances, enabling the visualization of key conversational connections. To evaluate the effectiveness of the composite representation, dimensionality‑reduction algorithms are applied to examine whether semantically similar utterances naturally form coherent clusters in low‑dimensional space. The resulting clusters are assessed for consistency and interpretability, demonstrating that the proposed representation successfully captures meaningful conversational structure.

Keywords: Conversational Semantics, Distributed Representations, Dimensionality Reduction (MAPE), Nonlinear Embedding, Semantic Clustering

DOI: 10.54941/ahfe1008009

Cite this paper

Downloads

58

Visits

61

Download PDF

More from this volume

← Applicability of Generative AI in Learning Systems for Assistive Technology Development Personnel Training The Importance of Integrating Personality as a Topic in Crew Resource Management Training →

View all articles in Training, Education, and Learning Sciences →