A Method of Structured Standard Terminology Based on Decoupling Approach

Open Access
Article
Conference Proceedings
Authors: Xinyu CaoZhengyuan HanYi YangLiangliang LiuPai PengHaitao Wang

Abstract: In the context of increasingly frequent interdisciplinary collaboration and global technological exchanges, constructing a ter-minology database is crucial for ensuring consistency in terminology and promoting effective communication. However, a large number of existing standard terminologies are stored in unstructured text files, lacking systematic organization, which hinders efficient construction and maintenance of terminology databases. Therefore, there is an urgent need to develop tools capable of accurately parsing and structuring standard terminology files. Current research primarily adopts rule-based matching and ma-chine learning methods for processing these files. However, these approaches suffer from format sensitivity and high coupling issues. The inconsistency in file formats, coupled with the difficulty for manually written style rules to comprehensively cover all scenarios, leads to poor robustness in parsing tools. Moreover, rule-based tools rely heavily on if-else logical judgments, increasing the coupling between rules and making it challenging to add new rules without causing conflicts, thus complicating maintenance and scalability. To address these issues, we propose a parsing tool tailored for standard terminology files that supports the structuring of "terms and definitions" sections from multiple file formats. The contributions of this paper include: 1) presenting a decoupled file parsing workflow; 2) proposing a set of rule matching and rule processing specific to the domain of standard terminology parsing; 3) developing and deploying an online system. In summary, the proposed parsing tool not only resolves the existing problems of format sensitivity and high coupling but also enhances the efficiency and accuracy of terminology file parsing through innovative decoupling design and domain-specific rule sets, providing strong support for the construction of terminology databases.

Keywords: Terminology, Text structuring, Decouplin

DOI: 10.54941/ahfe1006032

Cite this paper:

Downloads
8
Visits
12
Download