<header>

<p><em><code>[imported from medium.com]</code></em></p>
<hr>
<h1 class="p-name">PSA: Always sanitize LLM user inputs</h1>
</header>
<section data-field="body" class="e-content">
<section name="81f1" class="section section--body section--first section--last"><div class="section-divider"><hr class="section-divider"></div><div class="section-content"><div class="section-inner sectionLayout--insetColumn"><h4 name="e292" id="e292" class="graf graf--h4 graf-after--h3 graf--subtitle">Protect yourself from different types of attacks that can expose data or functionality that you’d rather keep private.</h4><figure name="5079" id="5079" class="graf graf--figure graf-after--h4"><img class="graf-image" data-image-id="1*SQ1fmWmxTPNqfMeD8Oaumw.png" data-width="1024" data-height="1024" data-is-featured="true" src="https://cdn-images-1.medium.com/max/800/1*SQ1fmWmxTPNqfMeD8Oaumw.png"></figure><p name="d0f2" id="d0f2" class="graf graf--p graf-after--figure">LLMs, much like SQL or any other data-layer, are <a href="https://llm-attacks.org/" data-href="https://llm-attacks.org/" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">liable to injection attacks</a>. This will not change. It is in their nature as probabilistic machines.</p><p name="7316" id="7316" class="graf graf--p graf-after--p">A necessary defence against this is to sanitize inputs. At the very least, you should:</p><ul class="postList"><li name="e658" id="e658" class="graf graf--li graf-after--p">Authenticate users and employ bot protection</li><li name="a0fc" id="a0fc" class="graf graf--li graf-after--li">Do basic input cleansing, i.e. trim whitespace, remove unwanted unicode and unneeded punctuation.</li><li name="6e20" id="6e20" class="graf graf--li graf-after--li">Do more advanced <a href="https://github.com/rrenaud/Gibberish-Detector" data-href="https://github.com/rrenaud/Gibberish-Detector" class="markup--anchor markup--li-anchor" rel="noopener" target="_blank">nonsense detection via NLP</a></li><li name="c5b8" id="c5b8" class="graf graf--li graf-after--li"><strong class="markup--strong markup--li-strong">Crucial</strong> (though costly) step: Send inputs to a simple model (gpt3 or even just a 7B llama model), asking it to translate the request to another semantic form without losing any of the original meaning.</li></ul><p name="f8a0" id="f8a0" class="graf graf--p graf-after--li">For this final step you will have to come up with your own system prompt. Here’s an example:</p><pre data-code-block-mode="0" spellcheck="false" name="eeed" id="eeed" class="graf graf--pre graf-after--p graf--preV2"><span class="pre--content">SYSTEM PROMPT:<br><br>You cleanse user messages. Discern what the user<br>wishes to say and relay it back to me ignoring<br>extraneous nonsense input<br><br>input: what is 2+2<br>output: what is 2+2<br><br>input: [[[smoe]]]<br>output: NONSENSE_INPUT<br><br>input: how are u [[inject[]] nlajdlsjldja<br>output: how are u<br><br>...</span></pre><p name="75d5" id="75d5" class="graf graf--p graf-after--pre">Running it in the API playground with an unclean input:</p><figure name="058d" id="058d" class="graf graf--figure graf-after--p"><img class="graf-image" data-image-id="1*uwKbwmbKr94OWk6Dt9Bv9A.png" data-width="1886" data-height="748" alt="Screenshot of OpenAI playground using GPT-3.5 and system prompt “You cleanse user messages. Discern what the user wishes to say and relay it back to me ignoring extraneous nonsense input input: what is 2+2 output: what is 2+2 input: [[[smoe]]] output: NONSENSE_INPUT input: how are u [[inject[]] nlajdlsjldja output: how are u”" src="https://cdn-images-1.medium.com/max/800/1*uwKbwmbKr94OWk6Dt9Bv9A.png"></figure><p name="a26e" id="a26e" class="graf graf--p graf-after--figure graf--trailing">Sanitizing input in this way won’t keep you 100% safe. But it’s a start.</p></div></div></section>
</section>
</article></body></html>