Jailbroken: How Does LLM Safety Training Fail? - Paper Explained

Published: 17 February 2024
on channel: DataMListic
779
20

In this video we talk about why large language models are susceptible to jailbreak as suggested in the “Jailbroken: How Does LLM Safety Training Fail?” paper. They propose two main causes for this vulnerability: competing objectives and mismatched generalization, and analyze various attacks on SoTA models like GPT4 and Claude v1.3.

References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
“Jailbroken: How Does LLM Safety Training Fail?” paper: https://arxiv.org/abs/2307.02483

Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why Language Models Hallucinate:    • Why Language Models Hallucinate  
Grounding DINO, Open-Set Object Detection:    • Object Detection Part 8: Grounding DI...  
Detection Transformers (DETR), Object Queries:    • Object Detection Part 7: Detection Tr...  
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained:    • Wav2vec2 A Framework for Self-Supervi...  
Transformer Self-Attention Mechanism Explained:    • Transformer Self-Attention Mechanism ...  
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA):    • How to Fine-tune Large Language Model...  
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained:    • Multi-Head Attention (MHA), Multi-Que...  
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p:    • LLM Prompt Engineering with Random Sa...  

Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Abstract
02:28 - Introduction
08:03 - Background: Safety-Trained Language Models and Jailbreak Attacks
10:45 - Failure Modes: Competing Objectives and Generalization Mismatch
20:05 - Empirical Evaluation of Jailbreak Methods
25:15 - Implications for Defense

Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic   / datamlistic  
📸 Instagram: @datamlistic   / datamlistic  
📱 TikTok: @datamlistic   / datamlistic  

Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)

If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon:   / datamlistic  
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a

#llm #jailbreak #llmsafety


Watch video Jailbroken: How Does LLM Safety Training Fail? - Paper Explained online without registration, duration hours minute second in high quality. This video was added by user DataMListic 17 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 779 once and liked it 20 people.