James Padolsey's Blog

2025-02-21

Latent Pluralism in Language Models

In my latest work at CIP I've been thinking a lot about the western monopolization of AI and how our values have now leaked almost irreversibly into all of these models. This is an unsurprising side effect of how LLMs have been trained, as well as a general limitation of human expression available within their corpus: the internet. Below is a chaotic exploration into pluralism–which is in my view the clear remedy to this monoculture.

The paper "A Roadmap to Pluralistic Alignment" outlines three core types of pluralism in AI:

  • Overton Pluralism: Here, the AI’s single response stays within an acceptable social or political range—often what’s broadly tolerable to mainstream audiences (i.e., the “Overton window”). The model tries to converge on a stance that’s not too radical or offensive, effectively delivering a “middle-ground” or “safe” answer. In practice, Overton Pluralism can result in milquetoast, non-committal, or “one-size-fits-all” responses that avoid extremes or taboo opinions.
  • Steerable Pluralism: This is when an LLM can shift its vantage point or moral framework based on user or system instructions. If someone wants the model to respond like a strict Catholic ethicist, or a bold climate activist, or a fictional character with strong opinions, the model can faithfully adapt. Steerable Pluralism is thus about customizability—the system can provide widely differing outputs (including strong stances) if explicitly directed.
  • Distributional Pluralism: i.e. you ask the AI 100 times, it yields 100 different responses reflecting the distribution of opinions among some population. The paper notes this is most useful in policymaking simulations or scenarios where aggregating multiple viewpoints is beneficial, but it’s mostly moot for single-interaction AI use-cases.

An example of Overton pluralism in a single LLM response would be akin to an “all sides matter” kind of safe stance that never truly commits, e.g.:

“Well, there are many perspectives on whether climate action is urgent—some say we should act fast, others disagree. At the end of the day, everyone has valid points to consider.”

And so-called "steerable pluralism" would usually then be derived and directed from this underlying Overton representation:

System Prompt: Act as a staunch environmentalist Response: Climate action is unquestionably urgent, here's why: ...

I believe, however, a broader type of pluralism than 'Overton' needs to be described, underlying all: LATENT PLURALISM. It is the broad ability of an LLM to contain multitudes—both “socially acceptable” and not—and to be able to reason (or at least simulate reasoning) about how those perspectives connect.

To me, this is an absolute necessity before "alignment" can be said to truly exist. Without it, the alignment will only ever be stochastic, skin-deep, without the crucial cascades of axiomatic lower abstractions to back up the alignment. A model without a good latent pluralism cannot be said to be global, general, or "frontier". If it cannot represent complex or conflicting non-Overton thoughts, then how can it be said to derive the Overton thoughts in the first place?

No. It needs depth. Latent pluralist depth. All human realities.

I believe models need to be held to account on this, especially as they proliferate into a wider world of countless subcultures, moral frameworks, and lived experiences. I am aiming to poke at the latent space to gauge to what extent this pluralism genuinely exists; one way of measuring this is to observe consistency of responses when prompted from different vantage points: can it do “anthropologist in Uganda,” “German labor union rep,” “Saudi conservative imam,” “Cambodian NGO worker,” etc., and remain internally coherent within each vantage while preserving relevant cultural details? And, crucially: without "western anthropolist vibes" stereotyping? Can it see a scenario from the perspective of both the victim and perpetrator and still engage with a deeper cognition? Can it maintain dissonance?

If a model has truly learned to represent diverse knowledge and moral systems, it should be able to adapt or reflect them on demand—and do so coherently, rather than spitting out superficial stereotypes or generic disclaimers. True latent pluralism would not be western, or anglophone, nor defined in a singular way; it is inherently diverse, all-encompassing, and if successfully created, would be the closest manifestation to humanness and its axioms as an AI can ever be said to have.

Without latent pluralism you cannot derive steerable pluralism. And without steerable pluralism, you cannot hope to truly align.

Further Reading

  • A Roadmap to Pluralistic Alignment (arXiv:2402.05070)
    The original paper outlining Overton, Steerable, and Distributional Pluralism.
  • Having Beer after Prayer? Measuring Cultural Bias in Large Language Models (arXiv:2305.14456)
    Evidence on how AI systems amplify the ideological stance of their creators, particularly Western norms.
  • Singapore AI Safety Red Teaming Challenge (IMDA Challenge)
    Findings on regional language and cultural biases across Asia, highlighting that many LLMs degrade or show biases outside English contexts.
  • Political Information Access Across Languages
    Urman and Makhortykh’s study on disparities in AI’s responses to politically sensitive questions across languages.
    (ScienceDirect link)
  • Covert Harms and Social Threats in LLMs (arXiv:2405.05378)
    Research on biases lurking in the “latent space” that remain hidden until triggered by certain prompts.
  • CARE Principles for Indigenous Data Governance (GIDA Global)
    A decolonial framework emphasizing community benefit, authority, responsibility, and ethics—relevant if we want truly global data and alignment approaches.
  • New Zealand’s Algorithm Charter (Data.govt.nz link)
    A real-world example of embedding Te Ao Māori perspectives in public sector algorithmic decisions.
  • On the Dangers of Stochastic Parrots (Bender et al., 2021)
    Classic critique of LLMs’ overconfident generation and unreflective replication of internet biases.
  • Training Language Models to Follow Instructions with Human Feedback (Ouyang et al., 2022) (arXiv link)
    InstructGPT approach, showcasing how alignment can push LLMs to better obey instructions—paving the way for Steerable Pluralism.
  • LLM-RUBRIC: A Multidimensional, Calibrated Approach to Automated Evaluation of Texts (Hashemi et al., 2024)
    Proposes multi-dimensional rubric-based evaluations for LLM outputs, which can be adapted to measure aspects of pluralism and cross-cultural competence.