ChatGPT's Linguistic Bias: A Barrier for Global English Varieties

ChatGPT exhibits bias against non-standard English varieties, reinforcing stereotypes and hindering comprehension. This linguistic bias has broader implications for global AI interactions.
ChatGPT, the AI marvel renowned for its adept communication in English, prompts a key question: Which English does it speak? With only 15% of its users hailing from the US, where Standard American English reigns supreme, the model is widely used across regions where other English dialects prevail.
Over a billion individuals worldwide converse in English variations like Indian, Nigerian, Irish, and African-American English. Yet, these non-standard dialects often face real-world prejudice. They've been labeled as unprofessional, incorrect, and even used as discriminatory proxies against one's race or nationality. The deeper question emerges: Does ChatGPT exacerbate these biases?
The Study
Researchers examined GPT-3.5 Turbo and GPT-4, scrutinizing their responses to ten English dialects. These included the standard American and British English, alongside eight non-standard varieties. The aim was to see if ChatGPT retained the linguistic essence of the input dialects and how it responded to them.
Annotating the prompts and outcomes for linguistic features, the study revealed that ChatGPT favors Standard American English, retaining its features over 60% more frequently than non-standard dialects. However, it occasionally imitates other varieties, such as Nigerian and Indian English, more than less populous dialects like Jamaican English. This points to the role of training data composition in shaping the model's responses to non-standard languages.
Results and Implications
ChatGPT's inclination towards American conventions poses a challenge for non-American users. For instance, responses to British spellings almost always revert to American spellings, frustrating users outside the US. Furthermore, model responses to non-standard varieties consistently display issues like stereotyping, demeaning content, and lack of comprehension, with stereotyping being 19% worse than with standard varieties.
When instructed to imitate input dialects, GPT-3.5 and GPT-4 responses exacerbate negative stereotypes and comprehension issues. Alarmingly, the newer GPT-4 model, despite improved warmth and friendliness, worsens stereotyping against minoritized varieties by 14% compared to GPT-3.5. This raises a pointed question: Are we naively assuming that bigger models inherently resolve linguistic biases?
are significant. As AI models permeate daily life, linguistic discrimination risks perpetuating stereotypes of non-standard English speakers as incorrect or less deserving of respect. These biases can entrench societal power dynamics and amplify global inequalities.
As users increasingly rely on AI, we must ensure these tools are inclusive and respectful of linguistic diversity. Otherwise, they may reinforce barriers rather than break them down.
Get AI news in your inbox
Daily digest of what matters in AI.