Evaluating Silicon Sampling: LLM Accuracy in Simulating Public Opinion on Facial Recognition Technology
Open Access
Article
Conference Proceedings
Authors: Charles Ma
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like responses, prompting exploration into their potential for social science research. "Silicon sampling," a method where LLMs are queried after being prompted with personas, has emerged as a possible alternative to traditional survey methods, especially given the increasing challenges associated with declining survey participation rates and rising costs. However, the accuracy of silicon sampling remains a subject of debate.This study examines the effectiveness of silicon sampling in replicating survey results on public acceptance toward facial recognition technology (FRT). The research builds upon the work of Kostka et al. (2021)*, who conducted a multinational survey across Germany, China, the United Kingdom, and the United States, analyzing public opinion on FRT alongside socio-demographic data and key contextual factors, including perceived consequences, utility, and reliability of the technology.The study addresses two research questions: (1) Can LLMs simulate an individual's surveyed opinions on FRT when prompted with a persona using only demographic information? (2) Can LLMs simulate an individual's surveyed opinions on FRT when prompted with a persona using both demographic and relevant contextual information?The research employs three LLMs: GPT-4o, Claude 3.5, and the open-source DeepSeek V3. It compares the LLM-generated responses to the original survey data, assessing the degree of alignment under three prompting conditions: demographic-only, contextual information-only, and demographic-plus-contextual information. To initially evaluate alignment, the differences between the percentages of each level of FRT acceptance were calculated. Additional metrics such as accuracy, mean absolute error, and F1-Scores are included in the extended paper. Preliminary results from GPT-4o and Claude 3.5 suggest that prompts incorporating both demographic and contextual information yield simulated responses that closely align with the original survey data. Consistent with prior findings, prompts based solely on demographics produce significantly less accurate results. By comparing closed-source models (GPT and Claude) with an open-source alternative (DeepSeek), the study also examines potential differences in reliability between these types of models. Multiple runs for each model are included to assess output variability and reproducibility within and between models.By demonstrating the importance of incorporating relevant contextual information into prompts, the study provides valuable insights into optimizing the silicon sampling technique and the accuracy of LLM-generated responses in survey simulations. Ultimately, this investigation advances the understanding of the capabilities and limitations of LLMs as tools for studying public opinion, particularly in the context of technology acceptance, and informs the development of best practices for utilizing silicon sampling in future research. The results suggest that, with careful prompting, silicon sampling can offer a viable and cost-effective alternative to traditional survey methods, potentially mitigating challenges related to declining response rates and increasing costs.*Kostka, G., Steinacker, L., & Meckel, M. (2021). Between security and convenience: Facial recognition technology in the eyes of citizens in China, Germany, the United Kingdom, and the United States. Public Understanding of Science, 30(6), 671–690. https://doi.org/10.1177/09636625211001555
Keywords: large language models, silicon sampling, technology acceptance, facial recognition technology
DOI: 10.54941/ahfe1006738
Cite this paper:
Downloads
11
Visits
51