Plain Italian and AI: Strengths and weaknesses of automatic linguistic simplification

Fiorentino, Giuliana

The simplification of language – particularly with regard to administrative discourse – has long been a central concern within Italian linguistics. Over the past few decades, significant progress has been made, including the development of consolidated and widely accepted lists of linguistic features – both morphosyntactic and lexical – that influence textual simplicity and accessibility (cf. Fiorentino/Ganfi 2024). These advances contributed to the early creation of a readability index, the Gulpease index, in the 1980 s (cf. Lucisano/Piemontese 1988). Within this framework, the authors have developed a software for the automatic simplification of administrative texts, supported by QWEN3 (a large language model, LLM), entitled SEMPLIT (cf. Russodivito et al. 2024; Fiorentino/Russodivito 2025; Ganfi/Russodivito 2025; Fiorentino et al. forthcoming; Fiorentino/Russodivito forthcoming). As part of this project, a corpus named ItaIst (Fiorentino et al. 2024b)1 was compiled and subjected to automatic simplification using the BASIC approach, resulting in a parallel corpus of simplified texts. This simplified corpus was then compared to the source corpus and evaluated in terms of improved readability and Semantic similarity (cf. Chandrasekaran et al. 2021), with the objective of validating the effectiveness of the simplification process. In this contribution, we introduce and validate a new methodology – the CHAIN approach – applied to a different corpus, ItaRegol (Fiorentino et al. 2024a).2 Although smaller in size than ItaIst, ItaRegol comprises rules and regulations, i. e., legally binding texts that create, modify, or extinguish subjective legal positions. Due to the legal nature of these texts, simplification must be carried out with caution to avoid altering their legal effects. This paper compares the two simplification approaches – BASIC and CHAIN – by evaluating the parameters adopted, assessing the quality of the simplified output, and drawing conclusions regarding the differing impact of these strategies in enhancing the readability of administrative versus regulatory texts.

IRIS Catalogo Istituzionale della Ricerca dell'Università degli Studi del Molise