French as a second language (L2) and AI: Deep Learning Models to the Rescue of Object Clitics

AHFE International

Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing

French as a second language (L2) and AI: Deep Learning Models to the Rescue of Object Clitics

Open Access

Article

Conference Proceedings

Authors: Adel Jebali

Abstract: Just like many other Romance languages, French includes units known as object clitics, which exhibit characteristics of both affixes and noun phrases (NPs). They resemble affixes in that they need a prosodically strong host to attach to, and they are similar to NPs in that they fulfill a syntactic role in the utterance. These properties, coupled with their unique positioning compared to the phrases they replace, categorize them as special clitics (Zwicky, 1983). All these factors place them at the intersection of phonology, morphology, and syntax. Consequently, it’s not surprising that they pose a challenge for learners whose first language isn’t French.Learners of French as a second language (L2) often find it difficult to master the use of these units, leading to mistakes and various avoidance strategies. Errors can include incorrect agreements (with the antecedent, as well as with the past participle and adjectives), non-standard placement (such as placing the clitic between the auxiliary and the past participle), resorting to strong pronouns (likely influenced by languages that allow it, such as English), and an incomplete understanding of certain morphosyntactic or semantic properties (such as the distinction between animate and inanimate or verb subcategorization). On the other hand, avoidance strategies include NP repetition and omission (Wust, 2009; Emirkanian et al., 2021).Could deep learning be the solution to assist these learners? We believe so.To train a model capable of identifying sentences containing errors in the use of clitic object pronouns, a substantial amount of training data is required. This data should include a significant number of correctly written sentences in French L2, along with sentences containing errors in the use of clitic object pronouns. Once collected, this data needs to be prepared for use in a deep learning model. The data must be cleaned, normalized, and encoded into a format that the model can interpret. The data can also be augmented with variations of similar sentences, allowing the deep learning model to learn to generalize and recognize errors in a wider context.Our project involves adapting a pre-trained FlauBERT model (Le et al., 2020), based on BERT (Devlin et al., 2019), for a grammaticality judgment task. We fine-tuned this monolingual model on a dataset of 5272 sequences annotated as correct or containing errors. This dataset includes authentic productions from learners of French L2 (Jebali, 2018), along with data collected from the web containing both real productions and modifications introducing non-authentic but plausible errors.After fine-tuning FlauBERT, we used it to provide grammaticality judgments on a second evaluation corpus containing data the model had never seen before. On this dataset, it achieved an overall F-score of 0.93, which is higher than the scores obtained by GPT 3.5 (ChatGPT) and Antidote 11.After fine-tuning this initial model, we further fine-tuned it on a corpus of 6936 examples of errors related to the use of these clitics. The task was to discriminate between four types of errors regarding these units: agreement, position, resort to strong pronouns, and syntactic or semantic order errors. This second model achieved an evaluation F-score of 0.95, demonstrating excellent classification capabilities.Both deep learning models can be seamlessly integrated into an automatic correction system to help French L2 learners avoid errors related to the use of clitic object pronouns.The system pipeline we’ve established using these two models takes a sequence of words (ranging from a sentence to an average-length paragraph), checks for errors in the use of the object clitic, and provides feedback based on the error type. We later added an additional generative module, a model fine-tuned on another corpus and based on mBARThez (Kamal Eddine et al., 2021), which is built on BART (Lewis et al., 2019). This module can suggest a correction for the sequence containing an error in the use of the object clitic.

Keywords: Education, AI, French L2, clitics

DOI: 10.54941/ahfe1005406

Cite this paper:

Downloads

204

Visits

493