Assessing the Impact of Automated Document Classification Decisions on Human Decision-Making

Open Access
Conference Proceedings
Authors: Mallory StitesBreannan HowellPhillip Baxley

Abstract: As machine learning (ML) algorithms are incorporated into more high-consequence domains, it is important to understand their impact on human decision-making. This need becomes particularly apparent when the goal is to augment performance rather than replace a human analyst. The derivative classification (DC) document review process is an area that is ripe for the application of such ML algorithms. In this process, derivative classifiers (DCs), who are technical experts in specialized topic areas, make decisions about a document’s classification level and category by comparing the document with a classification guide. As the volume of documents to be reviewed continues to increase, and text analytics and other types of models become more accessible, it may be possible to incorporate automated classification suggestions to increase DC efficiency and accuracy. However, care must be taken to ensure that tool-generated suggestions do not introduce errors into the process, which could lead to disastrous impacts for national security. In the current study, we assess the impact of model-generated classification decisions on DC accuracy, response time, and confidence while reviewing document snippets in a controlled environment and compare them to DC performance in the absence of the tool (baseline). Across two assessments, we found that correct tool suggestions improved human accuracy relative to baseline, and decreased response times relative to baseline in one of these assessments. Incorrect tool suggestions produced a higher human error rate but did not impact response times. Interestingly, incorrect tool suggestions also resulted in higher confidence ratings when DCs made errors that aligned with the incorrect suggestion relative to cases in which they correctly disregarded its suggestion. These results highlight that while ML tools can enhance performance when the output is accurate, they also have the potential for impairing analyst decision-making performance if inaccurate. This has the potential for negative impacts on national security. Findings have implications for the incorporation of ML or other automated suggestions not only in the derivative classification domain, but also in other high-consequence domains that incorporate automated tools into a human decision-making process. The effects of factors such as tool accuracy, transparency, and DC expertise should all be taken into account when designing such systems to ensure the automated suggestions improve performance without introducing additional errors. SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525

Keywords: human decision, making, machine learning (ML), human error, derivative classification

DOI: 10.54941/ahfe1003946

Cite this paper: