Open-source collaboration to assess the text complexity, helping to read and write in schools
Authors: Diego Palma, Christian Soto
Abstract: In this paper, we describe the role of open-source tools to develop educational systems. In particular, we describe TRUNAJOD, an open-source library to extract readability indices from texts, and how it was used to develop a tool to analyze text complexity of school texts to ease the workload on teachers when selecting a text for a particular reading comprehension task. This educational tool, helps the user to calibrate or select texts based on the target school level and desired complexity for the task, offering additionally important development possibilities for writing assessment.To achieve this, the tool uses TRUNAJOD open-source library to extract multiple proxies for text complexity such as lexical diversity, coherence, and cohesion-based metrics. To provide feedback to the end-user, we also rely on a data-driven approach, in which we try to summarize the set of text measurements into five textual dimensions: lexical similarity, referential cohesion, concreteness, connectivity, and narrativity. This set of predictors is obtained via a factor analysis on textbooks from the target school system and is also validated using human experts. The tool also relies on statistical techniques and machine learning techniques to create a model that can classify if a given text is adequate to a particular school level, given its linguistics content. On the other hand, the five latent dimensions obtained via factor analysis are used to provide feedback to the end-user, and these dimensions are transformed into a standardized scale, so the teacher can assess whether the text commits to a specific reading comprehension task (e.g. teacher might want to assess narrativity, or the ability of a student to make inferences based on text connectivity, etc.). Moreover, these 5 latent factors are used to build a global complexity index for the text, which, based on statistical distribution are assigned difficulty levels (i.e. easy, medium, hard). Results showed that the accuracy of the tool to classify the adequacy of a text given a target level is close to 80%. Moreover, results also showed that the latent factors correlate with the school level and that different school levels have a different set of linguistic features that impacts the adequacy of the text to a particular school level.The contribution of this work is threefold: Firstly, we present TRUNAJOD open-source library as a utility for researchers working on reading comprehension, text complexity, quality of writing, second-language acquisition, and any other task involving analysis of language/text.Secondly, we provide an architectural design based on open-source tools to create an educational system that eases the workload on school teachers for specific reading comprehension tasks. Thirdly the tool could be used to evaluate various aspects of writing, generating appropriate evaluation parameters for students of a given school level. This would allow undertaking new developments in educational technology, such as intelligent tutor systems, and since it collaborates with other existing open-source libraries, the architectural design could be extended to any other language.
Keywords: NLP, Natural Language Processing, Readability, Lexical Diversity, Text Complexity, Intelligent Tutor System Education, Distant Learning
Cite this paper: