Western Ontologically/Epistemically Biased Data
Given the dominant western, English language bias of LLM datasets, LLMs run into the problem of bad data in = bad data out. Semantically, ontologically, and epistemically, LLMs are being trained to reproduce knowledge through dominant western norms. Thus, the harms of LLMs will continue to reify dominant western norms of oppression.
Read more:
- Dialect prejudice predicts AI decisions about people’s character, employability, and criminality
- Linguistic Justice and GenAI
- There is a blind spot in AI research
- Accountable Algorithms
- At the Tensions of South and North: Critical Roles of Global South Stakeholders in AI Governance
- Algorithmic injustice: a relational ethics approach
- When Hackers Descended to Test A.I., They Found Flaws Aplenty
- Critiquing Big Data: Politics, Ethics, Epistemology
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
- Ethical Considerations in Machine Learning
- Fact-Checkers Are Scrambling to Fight Disinformation With AI
- ChatGPT Is a Blurry JPEG of the Web

Impacts
- Social Norm + Knowledge Reproduction