As of 2023, English was the official language in 67 countries and spoken by more than 1.5 billion people worldwide. Although it is not the most widely used native tongue, English has become a global lingua franca, particularly in academic and scientific contexts. This dominance, however, restricts the access of non-English speakers to educational and professional opportunities and perpetuates what may be termed “linguistic racism.” At the same time, the rapid rise of large language models (LLMs)—capable of instantaneous translation and text reformulation—risks amplifying the hegemony of English, potentially homogenizing communication and overshadowing linguistic diversity. This article examines how LLMs, often trained predominantly on English-language data, may inadvertently marginalize minority languages and cultures. Although these AI tools provide unprecedented convenience for cross-linguistic communication, they also pose ethical, social, and epistemic challenges. It is argued that governments and international bodies, such as UNESCO, should develop regulations to support language pluralism and protect minority cultures in the digital sphere. One possible approach involves fostering the creation and deployment of small language models specifically adapted to local contexts. Unlike larger, English-centric models, small language models can preserve linguistic nuance and reduce reliance on a single global standard. Concrete strategies to mitigate cultural homogenization include community-driven data curation, cultural impact assessments for AI deployment, and policies that promote open-access partnerships and data sovereignty. Ensuring that AI tools reflect the input of native speakers, local anthropologists, and sociolinguists can transform LLMs into instruments for preserving—and even revitalizing—endangered languages. Ultimately, a balanced approach to AI governance, combining technical innovation with cultural sensitivity, is essential. Such an approach can ensure that emerging language technologies enhance rather than erode global linguistic diversity, enriching rather than diluting the broader epistemic landscape.

English in LLMs: The Role of AI in Avoiding Cultural Homogenization

Andrea Lavazza
2025-01-01

Abstract

As of 2023, English was the official language in 67 countries and spoken by more than 1.5 billion people worldwide. Although it is not the most widely used native tongue, English has become a global lingua franca, particularly in academic and scientific contexts. This dominance, however, restricts the access of non-English speakers to educational and professional opportunities and perpetuates what may be termed “linguistic racism.” At the same time, the rapid rise of large language models (LLMs)—capable of instantaneous translation and text reformulation—risks amplifying the hegemony of English, potentially homogenizing communication and overshadowing linguistic diversity. This article examines how LLMs, often trained predominantly on English-language data, may inadvertently marginalize minority languages and cultures. Although these AI tools provide unprecedented convenience for cross-linguistic communication, they also pose ethical, social, and epistemic challenges. It is argued that governments and international bodies, such as UNESCO, should develop regulations to support language pluralism and protect minority cultures in the digital sphere. One possible approach involves fostering the creation and deployment of small language models specifically adapted to local contexts. Unlike larger, English-centric models, small language models can preserve linguistic nuance and reduce reliance on a single global standard. Concrete strategies to mitigate cultural homogenization include community-driven data curation, cultural impact assessments for AI deployment, and policies that promote open-access partnerships and data sovereignty. Ensuring that AI tools reflect the input of native speakers, local anthropologists, and sociolinguists can transform LLMs into instruments for preserving—and even revitalizing—endangered languages. Ultimately, a balanced approach to AI governance, combining technical innovation with cultural sensitivity, is essential. Such an approach can ensure that emerging language technologies enhance rather than erode global linguistic diversity, enriching rather than diluting the broader epistemic landscape.
2025
9780198945215
large language models, cultural homogenization, English, other languages, cultural values, protection
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12607/58629
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact