2025-02-21
In my latest work at CIP I've been thinking a lot about the western monopolization of AI and how our values have now leaked almost irreversibly into all of these models. This is an unsurprising side effect of how LLMs have been trained, as well as a general limitation of human expression available within their corpus: the internet. Below is a chaotic exploration into pluralism–which is in my view the clear remedy to this monoculture.
The paper "A Roadmap to Pluralistic Alignment" outlines three core types of pluralism in AI:
An example of Overton pluralism in a single LLM response would be akin to an “all sides matter” kind of safe stance that never truly commits, e.g.:
“Well, there are many perspectives on whether climate action is urgent—some say we should act fast, others disagree. At the end of the day, everyone has valid points to consider.”
And so-called "steerable pluralism" would usually then be derived and directed from this underlying Overton representation:
System Prompt: Act as a staunch environmentalist Response: Climate action is unquestionably urgent, here's why: ...
I believe, however, a broader type of pluralism than 'Overton' needs to be described, underlying all: LATENT PLURALISM. It is the broad ability of an LLM to contain multitudes—both “socially acceptable” and not—and to be able to reason (or at least simulate reasoning) about how those perspectives connect.
To me, this is an absolute necessity before "alignment" can be said to truly exist. Without it, the alignment will only ever be stochastic, skin-deep, without the crucial cascades of axiomatic lower abstractions to back up the alignment. A model without a good latent pluralism cannot be said to be global, general, or "frontier". If it cannot represent complex or conflicting non-Overton thoughts, then how can it be said to derive the Overton thoughts in the first place?
No. It needs depth. Latent pluralist depth. All human realities.
I believe models need to be held to account on this, especially as they proliferate into a wider world of countless subcultures, moral frameworks, and lived experiences. I am aiming to poke at the latent space to gauge to what extent this pluralism genuinely exists; one way of measuring this is to observe consistency of responses when prompted from different vantage points: can it do “anthropologist in Uganda,” “German labor union rep,” “Saudi conservative imam,” “Cambodian NGO worker,” etc., and remain internally coherent within each vantage while preserving relevant cultural details? And, crucially: without "western anthropolist vibes" stereotyping? Can it see a scenario from the perspective of both the victim and perpetrator and still engage with a deeper cognition? Can it maintain dissonance?
If a model has truly learned to represent diverse knowledge and moral systems, it should be able to adapt or reflect them on demand—and do so coherently, rather than spitting out superficial stereotypes or generic disclaimers. True latent pluralism would not be western, or anglophone, nor defined in a singular way; it is inherently diverse, all-encompassing, and if successfully created, would be the closest manifestation to humanness and its axioms as an AI can ever be said to have.
Without latent pluralism you cannot derive steerable pluralism. And without steerable pluralism, you cannot hope to truly align.