Listen to this chapter in the podcast from the beginning to learn more about full episode.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Mon, 11 Nov 2024

Description

Dario Amodei is the CEO of Anthropic, the company that created Claude. Amanda Askell is an AI researcher working on Claude's character and personality. Chris Olah is an AI researcher working on mechanistic interpretability. Thank you for listening ❤ Check out our sponsors: https://lexfridman.com/sponsors/ep452-sc See below for timestamps, transcript, and to give feedback, submit questions, contact Lex, etc. Transcript: https://lexfridman.com/dario-amodei-transcript CONTACT LEX: Feedback - give feedback to Lex: https://lexfridman.com/survey AMA - submit questions, videos or call-in: https://lexfridman.com/ama Hiring - join our team: https://lexfridman.com/hiring Other - other ways to get in touch: https://lexfridman.com/contact EPISODE LINKS: Claude: https://claude.ai Anthropic's X: https://x.com/AnthropicAI Anthropic's Website: https://anthropic.com Dario's X: https://x.com/DarioAmodei Dario's Website: https://darioamodei.com Machines of Loving Grace (Essay): https://darioamodei.com/machines-of-loving-grace Chris's X: https://x.com/ch402 Chris's Blog: https://colah.github.io Amanda's X: https://x.com/AmandaAskell Amanda's Website: https://askell.io SPONSORS: To support this podcast, check out our sponsors & get discounts: Encord: AI tooling for annotation & data management. Go to https://encord.com/lex Notion: Note-taking and team collaboration. Go to https://notion.com/lex Shopify: Sell stuff online. Go to https://shopify.com/lex BetterHelp: Online therapy and counseling. Go to https://betterhelp.com/lex LMNT: Zero-sugar electrolyte drink mix. Go to https://drinkLMNT.com/lex OUTLINE: (00:00) - Introduction (10:19) - Scaling laws (19:25) - Limits of LLM scaling (27:51) - Competition with OpenAI, Google, xAI, Meta (33:14) - Claude (36:50) - Opus 3.5 (41:36) - Sonnet 3.5 (44:56) - Claude 4.0 (49:07) - Criticism of Claude (1:01:54) - AI Safety Levels (1:12:42) - ASL-3 and ASL-4 (1:16:46) - Computer use (1:26:41) - Government regulation of AI (1:45:30) - Hiring a great team (1:54:19) - Post-training (1:59:45) - Constitutional AI (2:05:11) - Machines of Loving Grace (2:24:17) - AGI timeline (2:36:52) - Programming (2:43:52) - Meaning of life (2:49:58) - Amanda Askell - Philosophy (2:52:26) - Programming advice for non-technical people (2:56:15) - Talking to Claude (3:12:47) - Prompt engineering (3:21:21) - Post-training (3:26:00) - Constitutional AI (3:30:53) - System prompts (3:37:00) - Is Claude getting dumber? (3:49:02) - Character training (3:50:01) - Nature of truth (3:54:38) - Optimal rate of failure (4:01:49) - AI consciousness (4:16:20) - AGI (4:24:58) - Chris Olah - Mechanistic Interpretability (4:29:49) - Features, Circuits, Universality (4:47:23) - Superposition (4:58:22) - Monosemanticity (5:05:14) - Scaling Monosemanticity (5:14:02) - Macroscopic behavior of neural networks (5:18:56) - Beauty of neural networks

Audio

Featured in this Episode

Transcription

Full Episode

00:00 - 00:27 Lex Fridman

The following is a conversation with Dario Amadei, CEO of Anthropic, the company that created Claude, that is currently and often at the top of most LLM benchmark leaderboards. On top of that, Dario and the Anthropic team have been outspoken advocates for taking the topic of AI safety very seriously, and they have continued to publish a lot of fascinating AI research on this and other topics.

00:28 - 00:49 Lex Fridman

I'm also joined afterwards by two other brilliant people from Anthropic. First, Amanda Askell, who is a researcher working on alignment and fine-tuning of Claude, including the design of Claude's character and personality. A few folks told me she has probably talked with Claude more than any human at Anthropic.

00:50 - 01:02 Lex Fridman

So she was definitely a fascinating person to talk to about prompt engineering and practical advice on how to get the best out of Claude. After that, Chris Ola stopped by for a chat.

01:03 - 01:26 Lex Fridman

He's one of the pioneers of the field of mechanistic interpretability, which is an exciting set of efforts that aims to reverse engineer neural networks to figure out what's going on inside, inferring behaviors from neural activation patterns inside the network. This is a very promising approach for keeping future super-intelligent AI systems safe.

00:00 - 00:00 Lex Fridman

For example, by detecting from the activations when the model is trying to deceive the human it is talking to. And now a quick few second mention of each sponsor. Check them out in the description. It's the best way to support this podcast.

00:00 - 00:00 Lex Fridman

We got Encore for machine learning, Notion for machine learning powered note taking and team collaboration, Shopify for selling stuff online, BetterHelp for your mind. and element for your health. Choose Wisely, my friends. Also, if you want to work with our amazing team, or just want to get in touch with me for whatever reason, go to lexfriedman.com slash contact. And now onto the full ad reads.

00:00 - 00:00 Lex Fridman

I try to make these interesting, but if you skip them, please still check out our sponsors. I enjoy their stuff. Maybe you will too. This episode is brought to you by Encore, a platform that provides data-focused AI tooling for data annotation, curation, and management, and for model evaluation.

00:00 - 00:00 Lex Fridman

We talk a little bit about public benchmarks in this podcast, I think mostly focused on software engineering, SWEBench. There's a lot of exciting developments about how do you have a benchmark that you can't cheat on.

00:00 - 00:00 Lex Fridman

But if it's not public, then you can use it the right way, which is to evaluate how well is the annotation, the data curation, the training, the pre-training, the post-training, all of that, how's that working? Anyway, a lot of the fascinating conversation with the anthropic folks was focused on the language side.

Lex Fridman Podcast

#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Full Episode

Want to see the complete chapter?

Login Required