Jailbroken: How Does LLM Safety Training Fail? - Paper Explained

Опубликовано: 17 Февраль 2024
на канале: DataMListic

779

In this video we talk about why large language models are susceptible to jailbreak as suggested in the “Jailbroken: How Does LLM Safety Training Fail?” paper. They propose two main causes for this vulnerability: competing objectives and mismatched generalization, and analyze various attacks on SoTA models like GPT4 and Claude v1.3.

References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
“Jailbroken: How Does LLM Safety Training Fail?” paper: https://arxiv.org/abs/2307.02483

Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Why Language Models Hallucinate:    • Why Language Models Hallucinate
Grounding DINO, Open-Set Object Detection:    • Object Detection Part 8: Grounding DI...
Detection Transformers (DETR), Object Queries:    • Object Detection Part 7: Detection Tr...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained:    • Wav2vec2 A Framework for Self-Supervi...
Transformer Self-Attention Mechanism Explained:    • Transformer Self-Attention Mechanism ...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA):    • How to Fine-tune Large Language Model...
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained:    • Multi-Head Attention (MHA), Multi-Que...
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p:    • LLM Prompt Engineering with Random Sa...

Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Abstract
02:28 - Introduction
08:03 - Background: Safety-Trained Language Models and Jailbreak Attacks
10:45 - Failure Modes: Competing Objectives and Generalization Mismatch
20:05 - Empirical Evaluation of Jailbreak Methods
25:15 - Implications for Defense

Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic   / datamlistic
📸 Instagram: @datamlistic   / datamlistic
📱 TikTok: @datamlistic   / datamlistic

Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)

If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon:   / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a

#llm #jailbreak #llmsafety

Смотрите видео Jailbroken: How Does LLM Safety Training Fail? - Paper Explained онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь DataMListic 17 Февраль 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 779 раз и оно понравилось 20 людям.

77,957

105