Highlights: Accidentally teaching AI models to deceive us | Ajeya Cotra (2023)

Published: 25 October 2024
on channel: 80,000 Hours

181

Highlights from our episode with AI researcher Ajeya Cotra about the AI deception pitfalls we might be walking into as we train today's models (episode #151 of The 80,000 Hours Podcast, released May 2023).

There's much more to say on this topic, so if you enjoy this, definitely check out the full episode: • Accidentally teaching AI models to de...

Highlights include:
• Rob’s intro (00:00:00)
• How ML models might develop situational awareness (00:00:20)
• Why situational awareness makes safety tests less informative (00:04:21)
• What misalignment doesn't mean (00:08:31)
• Why it's critical to avoid training bigger systems (00:10:31)
• Why it's hard to negatively reinforce deception in ML systems (00:12:58)
• Can we require AI to explain its reasons for its actions? (00:18:44)
• Ways AI is like and unlike the economy (00:22:25)

Learn more and find the full transcript on the 80,000 Hours website:
https://80000hours.org/podcast/episod...

Watch video Highlights: Accidentally teaching AI models to deceive us | Ajeya Cotra (2023) online without registration, duration hours minute second in high quality. This video was added by user 80,000 Hours 25 October 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 181 once and liked it 3 people.

65,652

1.3K