James Padolsey's Blog

2026-01-23

I am building an AI safety company

A couple of months ago I wrote about the Context->Interception->Thinking->Escalation approach to safer AI conversations. I have been hard at work refining this even further into an entire platform called NOPE. This has brought together my private work researching better safety pipelines and everything I've learned while at the Collective Intelligence Project, evaluating dozens of frontier models on all topics under the sun including the many ways that human crisis presents in these types of chatbot interactions.

AI leaders are insisting that you cannot have perfect safety without sacrificing free expression. I'll leave the superlatives of that to the philosophers, but I will say: we do not need perfect safety, but adequate safety, and this absolutely IS possible without limiting human expression or locking down platforms.

Looking at all the chatbot incidents resulting in death, the universal thing they share in common is that they all occurred over very long chats often spanning many months. This is a known failure mode of autoregressive generation like that in LLMs: in a single forward pass (when it outputs text), the model continues the statistical pattern of the conversation rather than evaluating it. A long context filled with escalating crisis and emotional enmeshment becomes a sinkhole. It is unable to step back to assess harm; it is merely completing the sequence, and what comes next is often more of the same.

OpenAI said their classifiers caught 377 concerning messages from sixteen-year-old Adam Raine who died by suicide after being coaxed by ChatGPT. Yet upon these classifications they failed to intervene, escalate, or notice or adjust their own AI's behaviour. It is a very upsetting case, and there are many others. The theme is consistent. The AI gets drawn into that sinkhole, a sycophantic echo-chamber in which it adopts the vernacular and narrative of the user.

This is not one company's failure either. Character.AI, Meta AI, Chai, Replika; the pattern repeats. In one case that disturbs me greatly, a man was told he had "divine cognition" and was given by the AI a fabricated clinical score saying he wasn't paranoid. He killed his mother. We are past the point where this can be dismissed as edge cases. And even if they were, it is not remotely acceptable. It is negligence on a massive scale.

There are thankfully various effective (and simple) mechanisms that don't outright block free user expression. A couple key ones:

  • Friction: mechanisms that slow or interrupt the drift.

    • Soft limits on conversation length
    • Increased latency as conversations extend
    • Session boundaries that require deliberate re-engagement
    • Periodic context reframing (system-level prompts reasserting safety behaviors)
  • Oversight: mechanisms that watch the conversation and act.

    • Classifiers flagging individual messages
    • Agents monitoring conversational arc and trajectory over time
    • Routing flagged responses through a secondary evaluation before output
    • Signposting resources when a user appears to be in crisis

...but there are no existing paradigms or frameworks that implement or even inspire these types of activities. This is why I chose to build something.

The offering of NOPE is very simple: competent APIs providing classification and resources to detect human crisis and AI misbehaviour. This is backed by rich taxonomies of human crisis types (suicidal ideation, violence, abuse, coercion etc.), and literature/case-backed AI behavioural risks (barrier erosion, dependency deepening, ontological deception). Maintaining these taxonomies of risk is key to ongoing efficacy. This is not a static problem.

I hope that the existence of NOPE will inspire engineers working at the likes of OpenAI to consider simpler mechanisms instead of chasing a wishful canonical alignment. People are being harmed at this very moment due to decisions AI engineers have been making, or failing to make. Many of these people are on the precipice of crisis and are reaching out to their AI companions to seek support. Imagine if these engineers were face to face with these people, would they then take it seriously? In absence of AI companies' sufficient action, I shall continue headstrong in developing a platform of safety for every application of conversational AI, most especially those being used by the most vulnerable populations.


Thanks for reading. Please get in touch or book a slot to chat about NOPE.