Procedure Parsing: A Method for Parsing Handwritten Documents into Computer-Based Procedures
Authors: Stacey Whitmore
Abstract: The nuclear industry is heavily procedure driven, where almost everything has a step-by-step instruction that is expected to be followed in detail. Historically, these procedures were printed on paper copies. Recently, the industry transitioned towards electronic copies (i.e., PDFs on tablets). One major drive for this transition is the introduction of human error and loss of situation awareness when using paper copies. However, electronic copies of documents inherently have the same error traps as their paper cousins. Therefore, there is an increased interest in a way to utilize the information in the step-by-step guidance, but to present it in a dynamic manner that guides the user and adapts to any encountered conditions. Researchers at Idaho National Laboratory propose a flexible, automated method based on document parsing and augmented by natural language processing (NLP) techniques, to address these shortcomings and capitalize on these recent advancements in machine learning. The proposed method provides a cost-effective solution for computer-assisted procedure parsing of hand-written control room procedures, originally authored in Word or PDF formats, into instructions that can be displayed as computer-based procedures (CBP) in a modern graphical user interface. The researchers devised, implemented and demonstrated the Operating Procedure Extender for Novel Systems (OPENS) method in 2020. The key to OPENS is to map the original procedure text into a context-free grammar, tying content to equipment, locations, and other steps, actions, etc. This formal grammar is then used to isolate and define keywords and actions verbs, such as “measure” or “evaluate” and tie them to specific equipment referenced within that step or located in other steps, substeps, actions, subactions and tables throughout the procedure. OPENS generates an abstract syntax tree from the document which it uses to store a copy of this information in the open-standard, machine-readable and human-readable file formats XML and JSON. The XML is useful to preserve the relational aspects of the procedure for referencing tables and branching information so the user can be directed to the next appropriate active step based on the values entered for that step and previous steps. The JSON is useful for storing and exchanging data objects used to track responses to previous steps and state changes in simulated environments. In future iterations, these formats can also be used for storing more detailed information about input during plant operation or simulation. The techniques the researcher developed could further be improved by integration of recent advancements in machine learning. NLP methods could standardize documents, correct for grammatical error, and provide automated semantic validation. The researcher expects that self-supervised techniques applied to collections of natural language instructions could strengthen the model with broader context. All these methods together give us a practical way to automatically extract protocols from documents and user interactions, empowering researchers, procedure writers and nuclear operators while moving the industry forward.
Keywords: Computer-based Procedure Parsing, Context Free Grammar, Dynamic Procedures, Machine Learning, Natural Language Processing, AI, Artificial Intelligence
Cite this paper: