OCR-based Quality Assessment and Auxiliary Review System for Semantic Information Extraction from Engineering Drawings

Open Access
Article
Conference Proceedings
Authors: Bo PangJiansong Zhang

Abstract: Optical Character Recognition (OCR) has been widely adopted to extract textual information from legacy engineering drawings, aiming to transform image-based PDF documents to semantically enriched digital models. However, the quality of drawings varies due to variations in sources and formats, which degrades the performance of OCR and lowers the accuracy of extraction results. Therefore, manual review is needed to correct OCR outputs, requiring additional time and labor. To address this issue, the authors proposed an OCR-based quality assessment method combined with an auxiliary review system to enhance both the accuracy and efficiency of textual information extraction. A set of semantic- and task-driven criteria was designed to evaluate drawing quality. A dataset of 50 bridge plans in PDF format was annotated with “high” or “low” quality labels, and the textual content was manually transcribed for OCR performance evaluation. The proposed method applied Tesseract OCR to extract textual information and automate the quality assessment process. Token-level confidence scores were computed, and drawings with an average score below 80 were classified as low-quality. In the auxiliary review system, tables detected were reconstructed, and cells with text below this confidence threshold were highlighted, enabling reviewers to focus on potentially error-prone regions. Experiments on the annotated dataset showed that the proposed method achieved a precision of 97.14% and a recall of 87.18% in classification. By excluding low-quality drawings, the precision increased by 17.84% and the recall increased by 18.96% in information extraction. Additionally, the auxiliary review system highlighted 36.81% of the cells, indicating a potential reduction of over 60% in manual review time. Overall, the proposed method provides a lightweight approach to improve OCR-based semantic information extraction from engineering drawings in terms of accuracy and review efficiency.

Keywords: Optical Character Recognition, engineering drawings, semantic information extraction, quality filtering, auxiliary review system

DOI: 10.54941/ahfe1007033

Cite this paper:

Downloads
8
Visits
58
Download