Parallelising 2D-CNNs and transformers: A Cognitive-based approach for Automatic Recognition of Learners’ English Proficiency

Open Access
Article
Conference Proceedings
Authors: Meishu SongEmilia Parada-CabaleiroZijiang YangXin JingKazumasa TogamiKun QianBjörn SchullerYamamoto Yoshiharu
Abstract

Learning English as a foreign language requires an extensiveuse of cognitive capacity, memory, and motor skills in order to orallyexpress one’s thoughts in a clear manner. Current speech recognition in-telligence focuses on recognising learners’ oral proficiency from fluency,prosody, pronunciation, and grammar’s perspectives. However, the ca-pacity of clearly and naturally expressing an idea is a high level cognitivebehaviour which can hardly be represented by these detailed and segmen-tal dimensions, which indeed do not fulfil English learners and teachersrequirements. This work aims to utilise the state-of-the-art deep learningtechniques to recognise English speaking proficiency at a cognitive level,i. e., a learner’s ability to clearly organise their own thoughts when ex-pressing an idea in English as a foreign language. For this, we collectedthe “Oral English for Japanese Learners” Dataset (OEJL-DB), a corpusof recordings by 82 students of a Japanese high school expressing theirideas in English towards 5 different topics. Annotations concerning theclarity of learners’ thoughts are given by 5 English teachers according to2 classes: clear and unclear. In total, the dataset includes 7.6 hours ofaudio data with an average length for each oral English presentation of66 seconds. As initial cognitive-based method to identify learners’ speak-ing proficiency, we propose an architecture based on the paralelizationof CNNs and Transformers. With the strengthening of the CNNs in spa-tial feature representation and the Transformer in sequence encoding,we achieve a 89.4 % accuracy and 87.6 % Unweighted Average Recall(UAR), results which outperform those from the ResNet architectures(89.2 % accuracy and 86.3 % UAR). Our promising outcomes reveal thatspeech intelligence can be efficiently applied to “grasp” high level cog-nitive behaviours, a new area of research which seems to have a greatpotential for further investigation.

Keywords: Speech Intelligence, Transformer, English Proficiency

DOI: 10.54941/ahfe1001000

Cite this paper
Downloads
730
Visits
1393
Download PDF

More from this volume

An overview of the development of cloud-based CAE software in the context of industrial InternetAn exploration on stimulating game developers' engagement using sandbox game development environment in higher education design courses
View all articles in Intelligent Human Systems Integration (IHSI 2022): Integrating People and Intelligent Systems