James Padolsey's Blog

2024-07-11

AIs without levers are inert

I don't know whether this revelation is just really obvious or really dull. To me it's interesting because it flies in the face of lots of rheroric that people use when talking about AI. AI, in the mainstream context, nowadays, is basically synonomous with LLMs, or multimodal agents that have language models as their core. So that's what I'm talking about here when I say AI.

In AI safety presentations (and lamentations) people seem to recount all the times they've made an LLM output 'bad' stuff. Maybe they've encountered prejudiced or harmful outputs that give them cause for concern. They might then use these findings to backup arguments for more regulation.

The thing that bothers me is that these LLM outputs are, even if troubling, a bit moot; the downstream effect of them is what's interesting. LLMs by themselves, however, are inert. We make them harmful by letting them direct reality. We do this by giving them "appendages", "levers", or "interfaces". Consider a superintelligent AI-runtime stuck in a box. It can't do anything until we give it a button. Perhaps the button makes a cog turn or a kettle boil. Who knows. But if it only has one button, and we know that even if it pressed that button a million times, nothing bad would happen, then we can rest assured. It's not different from discussions of any other human created algorithms, deterministic or not. They are as detrimental as we let them be by giving them hooks into reality. So it is the implementers of the hooks – the levers and buttons – who need to be careful.

I guess it's a bit obvious, but also a bit nuanced..?

Fundamentally: I'd rather more AI safety discussions focused on the levers and not the algorithms. Levers move soft reality into hard reality. Hard reality is the one we can be harmed by. Regulating an algorithm is like regulating a thought. It doesn't work. But regulating output, where the thought meets reality? That's meaningful.

Examples of algo vs. lever regulation:

Here's some nuanced differences in various AI domains between regulating the algorithm vs. regulating the lever:

Text Generation Interfaces
- Regulating the algorithm: Requiring the LLM to be trained on "approved" datasets or to have certain biases removed during training.
- Regulating the lever: Implementing content filters that screen generated text for harmful content before displaying it to users.
Code Execution Environments
- Regulating the algorithm: Mandating that the AI be trained to avoid generating certain types of potentially malicious code.
- Regulating the lever: Implementing a sandboxed environment where generated code is executed, limiting its access to system resources.
Decision Support Systems (e.g. reviewing CVs)
- Regulating the algorithm: Mandating that the AI be trained on diverse datasets, or primed on ethical principles, to reduce bias in its recommendations.
- Regulating the lever: Requiring that AI recommendations always be presented alongside human expert opinions, or implementing a system where AI suggestions below a certain confidence threshold trigger additional review.

Closing thoughts.

Let's talk about algorithms and levers as distinct things.
Policing algorithms is probably fruitless.
Where soft reality meets hard reality, however, we need protections.

Thanks for reading! (written by James)