Human-Centered Design of Voice Communications: Gender Aspects

Open Access
Conference Proceedings
Authors: Jan HolubYann Kowalczuk

Abstract: Perceiving the transmitted speech is a task that puts certain amount of cognitive load on the human brain. The degree of this load depends on several factors, e.g., the loudness of the perceived speech, the type and intensity of background noise, the quality and accent of the speech, familiarity with the topic of the message, etc. This load also varies between the native and non-native language (of the listener). Different levels of such load are manifested in longer duration workloads (e.g., during a work shift) by different levels of overall fatigue, which affects the decrease in the worker's action or decision error rate when performing other concurrent tasks (the so-called parallel-task paradigm). For technologies used in speech transmission or synthesis, e.g., in telecommunications, radio communications, and machine to human communications, the above implies a strong need to optimize the coding of human (or synthetic) voice to minimize listening effort during communication. Listening effort (LE) can be assessed by subjective tests following, e.g., ITU-T P.800 Recommendation, along with listening quality (LQ) as specified in P.800. A natural (but nowhere explicitely mentioned) requirement is that male and female voices are transferred with similar LQ and LE parameters; in other words, the transmission technology, including coding algorithms, frequency filters, or sampling rates, should not privilege one gender over the other to maintain similar working conditions and opportunities for all.The subjective test laboratory has performed gender analysis for all subjective test projects since 2018 to see how (mis)balanced the transmission quality between male and female speakers is. The identified misbalance can affect many professionals that deploy distant voice communication in their daily duties – think of female airport approach control dispatchers or other professionals (policewomen) who are principally handicapped by technological aspects of their job - worse voice transmission quality means higher listening effort is needed and may lead to consequent (subconscious) discomfort of their communication partners, or even intelligibility issues. Of course, this fact is not surprising for narrow-band or even old analog AM transmissions (as still used in AIRCOM). It can only be used as an argument to upgrade communication means to a suitable digital format. Unfortunately, some contemporary wide-band or even full-band digital communications also show statistically significant differences between quality of transferred male and female voices. The detailed results will be presented, including interesting systematic language dependencies (English, German, Mandarin).In the conclusions, suggestions for future codec designs considering the human-centric gender-balanced requirements are proposed. These include the minimum frequency response of the future coders, granularity of the perceptual frequency scaling, etc. Also, suggestions for gender neutrality of original (studio quality) recordings used to prepare the speech samples for the subjective tests are included.

Keywords: QoE, human voice communication, gender-balanced design, subjective testing

DOI: 10.54941/ahfe1002926

Cite this paper: