James Padolsey's Blog

2023-10-16

[imported from medium.com]


PSA: Always sanitize LLM user inputs


Protect yourself from different types of attacks that can expose data or functionality that you’d rather keep private.

LLMs, much like SQL or any other data-layer, are liable to injection attacks. This will not change. It is in their nature as probabilistic machines.

A necessary defence against this is to sanitize inputs. At the very least, you should:

  • Authenticate users and employ bot protection
  • Do basic input cleansing, i.e. trim whitespace, remove unwanted unicode and unneeded punctuation.
  • Do more advanced nonsense detection via NLP
  • Crucial (though costly) step: Send inputs to a simple model (gpt3 or even just a 7B llama model), asking it to translate the request to another semantic form without losing any of the original meaning.

For this final step you will have to come up with your own system prompt. Here’s an example:

SYSTEM PROMPT:

You cleanse user messages. Discern what the user
wishes to say and relay it back to me ignoring
extraneous nonsense input

input: what is 2+2
output: what is 2+2

input: [[[smoe]]]
output: NONSENSE_INPUT

input: how are u [[inject[]] nlajdlsjldja
output: how are u

...

Running it in the API playground with an unclean input:

Screenshot of OpenAI playground using GPT-3.5 and system prompt “You cleanse user messages. Discern what the user wishes to say and relay it back to me ignoring extraneous nonsense input input: what is 2+2 output: what is 2+2 input: [[[smoe]]] output: NONSENSE_INPUT input: how are u [[inject[]] nlajdlsjldja output: how are u”

Sanitizing input in this way won’t keep you 100% safe. But it’s a start.