Investigation of Weaknesses in Typically Anomaly Detection Methods for Software Development
Authors: Hironori Uchida, Keitaro Tominaga, Hideki Itai, Yujie Li, Yoshihisa Nakatoh
Abstract: Software systems are rapidly increasing and diversifying due to technological innovations such as IoT, artificial intelligence, and blockchain. Accordingly, automatic analysis of software logs has recently attracted particular attention as a research area to ensure system reliability. Currently, in the research domain, anomaly detection in text logs using CNN, LSTM, and Transformer-based DNN models has shown high accuracy of over 90%. However, contrary to these excellent results, there are reports that it has not been used in the field of the software development field. We predict that the reason for this lies in the way the models are evaluated and, in the datasets, so we investigate using a representative anomaly detection model and the common dataset BGL. First, we investigate the effect of the splitting ratio of the dataset. As a result, we confirm that the accuracy decreases as the number of unknown anomaly logs increases. As a result, we identify features that are over-learned in all supervised learning models. In addition, we validate the generality of the model with the validation datasets and learning curves. The results show signs of overfitting in both supervised and unsupervised learning models. These results suggest that the composition of the dataset used affects the accuracy of the log-text anomaly detection model. Therefore, we plan to create a dataset with multiple anomaly patterns based on the logs used in the software development domain and create a model that can detect anomalies with the created dataset.
Keywords: Anomaly Detection, Software Log, Log Analysis, Deep Learning
Cite this paper: