Analysis of Large Language and Instance-Based Learning Models in Mimicking Human Cyber-Attack Strategies in HackIT Simulator

Open Access
Article
Conference Proceedings
Authors: Shubham SharmaShubham ThakurMegha SharmaRanik GoyalShashank UttraniHarsh KatakwarKuldeep SinghPalvi AggarwalVarun Dutt

Abstract: Understanding human strategies in cyber-attacks is essential for advancing cybersecurity defense mechanisms. However, the ability of computational cognitive and artificial intelligence (AI) models to effectively replicate and predict human decision-making in realistic cyber-attack scenarios remains underexplored. This study addresses this gap by evaluating the performance of two distinct models—Instance-Based Learning (IBL; cognitive model) and a Large Language Model (LLM, GPT-4o; AI model)—in mimicking human cyber-attack strategies using the HackIT simulation tool.The experiment employed a 2 × 2 design varying network topology (Bus vs. Hybrid) and network size (Small: 40 nodes vs. Large: 500 nodes), involving 84 randomly assigned participants (42 teams) across four conditions: Hybrid 40 (24 participants, 12 teams), Hybrid 500 (22 participants, 11 teams), Bus 40 (18 participants, 9 teams), and Bus 500 (20 participants, 10 teams). Participants collaborated in pairs over 10-minute sessions to attack networks consisting of an equal mix of honeypots (50% fake systems) and real systems (50% regular systems). Attack strategies included scanning for vulnerabilities with nmap followed by exploiting identified weaknesses through HackIT. Both human and model performance was evaluated on three dependent variables: total systems exploited, total honeypots exploited, and total real systems exploited. Eighty percent of the human data was used for model training, and 20% for testing.The IBL model, calibrated with ACT-R cognitive architecture parameters (decay and noise ranging from 0.1 to 3), closely mirrored human behavior across conditions and excelled in distinguishing honeypots from real systems exploited, especially in smaller networks. For instance, in the "Bus 40" condition, the IBL model achieved a lower mean squared error (MSE = 0.0576) compared to human participants in honeypot exploitation. Similarly, the IBL model outperformed in detecting honeypots across conditions, demonstrating its ability to replicate complex cognitive processes.The GPT-4o model showed exceptional flexibility, especially in smaller networks, after being adjusted for temperature (0.5, 1, 1.5) and top-k sampling (2, 3, 4). For instance, GPT-4o demonstrated equivalent performance in the "Bus 40" condition, exploiting 19 systems with an MSE of 1.000 in comparison to the 20 systems used by human participants. In real-system exploitation, it demonstrated its capacity to scale and dynamically modify tactics, consistently achieving excellent accuracy across configurations.The IBL model offered more profound insights into cognitive decision-making processes, while GPT-4o was superior at making use of real systems and adjusting to complicated situations, according to model validation using HackIT simulations. Both models showed complementary strengths, with GPT-4o doing exceptionally well in whole and real-system exploits and IBL providing excellent honeypot detection.By using cognitive and AI-based models to replicate human attacker activities across various network setups, our study closes a significant knowledge gap. The findings highlight the usefulness of IBL in revealing the cognitive foundations of decision-making and the scalability of GPT-4o for complicated scenarios. When combined, these models provide a strong basis for simulating hostile tactics, locating weaknesses, and bolstering defenses in contemporary cybersecurity settings.

Keywords: Behavioral Cybersecurity, Instance-Based Learning (IBL), Large Language Models (LLMs), HackIT Simulator, Human Decision-Making, Behavior Modeling

DOI: 10.54941/ahfe1006143

Cite this paper:

Downloads
10
Visits
40
Download