James Padolsey's Blog

2024-12-15

Working with LLMs – not against them.

Learning to talk to an LLM is an odd sort of thing to do^*. It is to induct oneself into an alien pattern of thought, where you are not asking for things like we do of humans, but instead, with every word or hint, inserting probabilities and weights into a singular brainwave.

To derive true and deterministic value from this is an enchanting art for a hacker, but a painful nuisance for a programmer. But we need to figure something out. LLMs are very useful but to make apply them to non primitive tasks we require new paradigms and new abstractions.

What can we learn from other software stacks in dealing with this high entropy?

The Entropy Stacks

What comes to mind first is Networking: trillions of bits flying overhead and somehow, through wizardry of Packet Interleaving, Error Correction, TCP and tonnes of other protocols, they all end up in the right order, hitting you squarely in the face as you read this. That's pretty incredible. So too is the web platform resting precariously on top, built of things that had to be more resilient by design. So much about a browser’s implementation is about being graceful with incorrect inputs: content-type sniffing, malformed URLs, corrupt character encodings, DNS resilience, CSS's quiet handling of unsupported properties and JavaScript's loosely-typed nature. Every layer evolved to handle human messiness and the cog-meets-cog of myriad ugly interfaces, all whilst maintaining functional.

HTML especially exemplifies this philosophy. When it was adopted by Tim et al. for the WWW, it took hold as the obvious format of choice because it was easy enough to write in a text editor. And for us fallible humans, that was vital. We make mistakes, all the time. But HTML, being essentially a progressively-enhanced text file – is accepting of them. Even if something is not renderable in a browser, it is still right there in its text representation. The philosophy of HTML, and of the web generally can be expressed through Postel’s Law, borne of the creator of TCP: “Be liberal in what you accept, and conservative in what you send.”

This seems applicable to LLMs. Like humans, they are quite bad at conforming to rules and grammars, so it makes sense to build abstractions and protocols that are accepting of that. And what better protocol to use than one which has been time-tested on decades of human frailties and just so happens to be richly represented in the training corpuses of these LLMs? Yes, HTML! And XML. Or really any flexibly-parseable method of textual annotation and delineation. JSON, however, does not in my mind fit the bill. It is a brittle grammar. Making LLMs speak in JSON is like asking for a poet to write verse in a spreadsheet. It is not in their nature. It limits their expression. It ties them down.

To be fair, there are methods of making LLMs only yield valid JSON, but these are not widespread and often involve provider lock-in, like to OpenAI’s function-calling paradigm or Anthropic’s tool-use variant. Many of these approaches are brittle, with the only recovery option being to retry again and again. These function-calling abstractions are certainly useful, and perhaps they’ve got a niche, but for me, I’ve fallen in love with the creative prose-rich streaming nature of LLMs, and to not have access to that feels self-limiting, especially with the creative diversity of language models popping up all the time in the wilds of HuggingFace.

XML through HTML's eyes

So, back to HTML, and its “I don’t care what tag you use but you better close it!” cousin, XML. I have been using some form of boundary markers and delimiters in my LLM usage since the OpenAI Davinci days. Pre-chat-tuning, arbitrary boundary markers were the simplest way to imply progression from one ‘data concern’ to the next, or–indeed–a ‘chat’. I started using regular expressions to gather and separate text I cared about since they allowed for more flexible matching. I also experimented with singular character markers for indicating specific actions in a streaming completion, like rendering a form in a chatbot. It worked well. Eventually though it became blindingly obvious to just use XML. It’s right there, time-tested, and very forgiving when parsed with an HTML parser that is accustomed to dodgy human-written markup! If prompted carefully, I found that most LLMs complied well, and when they didn’t, it was usually recoverable.

And since I retain access to the raw creative stream, I can decide on how recover from errors like unclosed tags or incorrect attributes. This lies in contrast to “hope and pray JSON generation” and models using constrained decoders, where you’re forced to make unideal trade-offs between creativity and structure.

I ended up bundling this approach – essential an XML/HTML streaming methodology with schemas – into a library I’ve been using for around six months. It’s called xmllm. It lets you define a schema, prompt an LLM, and get the output you want. It uses XML as its invisible medium, inserting its own scaffolding into the prompts you use and intercepting the tags downstream before giving you iteratively completed data. It is model-agnostic, stream-friendly and quite easy to link up with reactive UIs, if that’s your thing.

It works like this:

import { simple } from 'xmllm';
const data = await simple('nice pet names?', {
    schema: { name: Array(String) }
});

Giving you back this:

{
  name: ['Charlie', 'Bella', 'King Julian']
}

And due to the flexibility of the HTML parser, it even works with high-temperature funky low-param model giving us intriguing rubbish like this…

    Hi im a plucky    and annoying
    little llm and sure i can
    help with      your request for 
    PET NAMES, how about <name>
    Charlie</name> or
    maybe <name>Bella </ IM MESSING THINGS UP ></name>
    <name>King
    Julian

See a ‘demo’ of LLM->UI streaming here. And the xmllm github repo here.

To wrap up...

I believe LLMs will ultimately be so fast that streaming becomes moot, and so capable that one doesn’t need to sacrifice creativity or competency to get fixed grammars like JSON. But we’re not there yet, and in the meantime I want to be able to use a variety of models, from Qwen’s 2.5B to Llama 2 70B. I don’t want to be locked in. And xmllm works quite reliably across the model landscape, and even in difficult cases where XML compliance is hard won, you can still employ old-school prompt-engineering techniques to make it yield correctly, and you can implement subtle error recovery instead of dealing with wholesale failures.

My main message in this post here is not necessarily to even use xmllm; it’s just to consider the merits of using markup languages with good representation in training corpuses with flexible parsers to get structured data from LLMs. Loosely interpreted XML just happens to be the best I’ve found to date.

Have a look at xmllm here.

Random Footnote: I don’t mean to over anthropomorphise or be rhetorical when I say this, but truly: LLMs are like humans that have been over-dialled to certain sensory inputs from maladaptively over-mylinated pathways in the brain. I speak of this as someone with a brain injury and various lived experiences of mental health crises; I know what it’s like to be over-diallled. A slight wrong movement and you set the universe of a person’s day on a completely different axis. This is how LLMs work. If we are to anthropomorphise AI then I’d bias to a psychologically traumatised entity, not a normative psychology. DSM VII will speak of AI, trust me.

Thanks for reading! I’m lately on Bluesky – Please follow me for more things like this.