Knowledge Graph-Enhanced Large Language Model Framework for Privacy-Preserving Document Processing in the AEC Domain

Open Access
Article
Conference Proceedings
Authors: Fan YangHazar Nicholas DibJiansong Zhang

Abstract: Data privacy and safety are critical concerns for companies in the Architecture, Engineering, and Construction (AEC) domain, which routinely handle sensitive textual data such as design criteria, project specifications, and compliance records. Protecting this information is vital for maintaining competitive advantage, meeting legal requirements, and ensuring safety and accountability. However, processing such domain-specific data is challenging. Rule-based systems require extensive manual rule sets, while supervised machine learning models need large, annotated datasets - both of which limit scalability and applicability in AEC contexts. Recent advances in large language models (LLMs) offer a promising alternative due to their ability to perform natural language tasks with minimal supervision. Yet, general-purpose LLMs pose two major concerns: they may generate inaccurate or irrelevant outputs on technical content, and their reliance on online services introduces significant privacy risks. To address these issues, this paper proposes a knowledge graph-enhanced LLM framework designed for local, privacy-preserving processing of sensitive AEC documents. Using the 2015 International Building Code (IBC) as an example, the framework operates in two stages. First, an LLM converts selected IBC chapters into a structured knowledge graph with 234 entities, 131 relationships, and 8 communities. Second, another LLM retrieves relevant context from the graph to generate accurate query responses. The system employs open-source models - nomic-embed-text for text embeddings and deepseek-r1 for context retrieval and generation. Evaluation using 661 query-answer-context records showed an average semantic similarity score of 0.83 and an average answer relevancy score of 0.71, indicating high accuracy and contextual alignment. The system runs entirely on a standalone machine, preserving full data privacy and incurring no cost. This work demonstrates a secure and effective approach for using LLMs in privacy-sensitive, domain-specific applications and lays the foundation for broader adoption in similar fields.

Keywords: Knowledge Graph, Building Code Interpretation, Large Language Models, Retrieval Augmented Generation, Data Privacy

DOI: 10.54941/ahfe1006921

Cite this paper:

Downloads
7
Visits
46
Download