Usability Study of Auditory CAPTCHA

Open Access
Conference Proceedings
Authors: Chia-Hung LeeYing-Lien Lee

Abstract: CAPTCHA is a security system to distinguish whether a user is a human being or an automated program by asking questions that are hard for artificial intelligence yet relatively easy for human to answer. Two most popular forms of CAPTCHAs are text and audio; this study attempts to explore the latter one, which is common in situation where visual interaction is not applicable, such as in voice-based interaction or for visually challenged users. Auditory CAPTCHAs can be breached by content analysis and guessing through Automatic Speech Recognition (ASR), it is then necessary to blend certain level of interference to counterattack. However, by doing so, auditory CAPTCHAs have became too hard to human being to solve. Solving auditory CAPTCHAs is akin to Cocktail Party Effect, which refers to our ability to process main audio signals preferentially and ignore other irrelevant ones in noisy environments. This study explores the current designs of auditory CAPTCHAs to see how well our “cocktail party ability” performs when interacting with different CAPTCHA designs. An experiment with repeated measurement factorial design is conducted; thirty-six participants take part. The main signals, or the signals to be processed, are pronounced either by random male speaker (RMS), random female speaker (RFS), or mixed speaker (MS); while the interference signals, or the signals to be ignored, are pronounced either by random male (RMN), random female (RFN), or mixed noise (MN). Fifty percent of the interference contents sound similar to the main contents, while the other fifty percent are normal conversation noises. Error rates and subjective preferences are collected during the experiments. Results show that sound similarity is problematic; the error rates are significantly higher than its counterpart. The combination of RMS and RFN has significantly lower error rate due to greatest pitch difference; our participants also prefer this one for its relative easiness. On the other hand, for combination of RMS and RMN, the error rates are significantly higher and the preference scores lower. The results have important implications for auditory CAPTCHA design.

Keywords: auditory CAPTCHA, cocktail party effect, pitch difference

DOI: 10.54941/ahfe100449

Cite this paper: