Detecting LLM Prompt Injections & Jailbreak Attacks
Hands-on workshop on how to identify and mitigate malicious issues in your LLMs.
BERNEASE HERMAN
Senior Data Scientist
WhyLabs
Language models have fallen victim to numerous attacks that pose serious risks - specifically jailbreak attacks and prompt injections, which have the power to generate hate speech, create misinformation, cause private data leakage, and other restricted behaviors.
This workshop will cover:
- What jailbreaks and prompt injections are and how to identify them in your LLMs
- Using privilege control, robust system prompts, human in the loop and monitoring to prevent the attacks
- Using similarity to known attacks to compare a set of known jailbreak/prompt injection attacks to incoming user prompts
- Proactive prompt injection detection to devise a preflight instruction prompt combined with the target prompt to analyze the response