Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity
Dario Amodei
Yeah, I mean, of course you can hook up the mechanistic interpretability to the model itself, but then you've kind of lost it as a reliable indicator of the model state. There are a bunch of exotic ways you can think of that it might also not be reliable. Like if the model gets smart enough that it can like, jump computers and read the code where you're looking at its internal state.
0
💬
0
Comments
Log in to comment.
There are no comments yet.