Detecting Potential Depressed Users in Twitter Using a Fine-tuned DistilBERT Model

Open Access
Conference Proceedings
Authors: Miguel Antonio AdarloMarlene De Leon

Abstract: With the spread of Major Depressive Disorder, otherwise known simply as depression, around the world, various efforts have been made to combat it and to potentially reach out to those suffering from it. Part of those efforts includes the use of technology, such as machine learning models, to screen a potential person for depression through various means, including social media narratives, such as tweets from Twitter. Hence, this study aims to evaluate how well a pre-trained DistilBERT, a transformer model for natural language processing that was fine-tuned on a set of tweets coming from depressed and non-depressed users, can detect potential users in Twitter as having depression. Two models were built using the same procedure of preprocessing, splitting, tokenizing, training, fine-tuning, and optimizing. Both the Base Model (trained on CLPsych 2015 Dataset) and the Mixed Model (trained on the CLPsych 2015 Dataset and a half of the dataset of scraped tweets) could detect potential users in Twitter for depression more than half of the time by demonstrating an Area under the Receiver Operating Curve (AUC) score of 65% and 63%, respectively, when evaluated using the test dataset. These models performed comparably in identifying potential depressed users in Twitter given that there was no significant difference in their AUC scores when subjected to a z-test at 95% confidence interval and 0.05 level of significance (p = 0.21). These results suggest DistilBERT, when fine-tuned, may be used to detect potential users in Twitter for depression.

Keywords: Twitter, Depression, Machine Learning, Transformers

DOI: 10.54941/ahfe1001458

Cite this paper: