When Workflows Stay Stable but Meaning Moves in Agentic Analyst Pipelines

Stephen Russell

doi:10.54941/ahfe1007675

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

When Workflows Stay Stable but Meaning Moves in Agentic Analyst Pipelines

Open Access

Article

Conference Proceedings

Authors: Stephen Russell

Abstract

Agentic analyst pipelines can keep a workflow object stable while its operational meaning shifts underneath. A coded event, alert, or dashboard tile may persist unchanged while the reporting frame moves from accident to attack, or from outage to state-linked operation. This semantic movement can reach the analyst undetected, degrading situation awareness while the interface appears organized. This research proposes a measurement and governance layer for detecting semantic frame shifts in agentic analyst workflows, using 1032 headline and snippet records organized into 37 adjudicated event-window states across 10 event families. Three findings bear directly on human-agentic workflow design. First, human coders outperformed all tested LLM coding agents on a binary frame-shift endpoint, with the best LLM condition reaching kappa 0.549 against adjudication compared to kappa 0.784 for human coders, confirming that human semantic judgment is a more reliable reference in analyst triage workflows. Second, our proposed embedding-based metrics outperformed every LLM condition as computational detectors: translational drift reached AUROC 0.894, neighborhood rewiring AUROC 0.841, and a combined reframing-pressure score AUROC 0.909, classifying workflow states into stable interpretation, smooth directional drift, semantic churn, and major reframing. Third, LLM coding agents varied materially by model and prompt policy, paralleling the variation observed across human coders, which means that consensus among human judgment, semantic metrics, and LLM signals cannot be assumed and must be treated as an active governance variable in workflow design. Together these findings support an orchestration layer that routes high-reframing or high-disagreement cases to human review while allowing semantically stable outputs to pass forward. As demonstration, the proposed metrics and approach are implemented in an open source intelligence system.

Keywords: human-agentic workflows, agentic AI, semantic drift, LLM-as-judge, situation awareness, analyst pipelines

DOI: 10.54941/ahfe1007675

Cite this paper

Downloads

38

Visits

72

Download PDF

More from this volume

← Agentic LLMs for Scalable, Verifiable System Health Digital Twins Effects of Swarm Size Variability on Operator Workload →

View all articles in Human Factors in Robots, Drones and Unmanned Systems →