Menu
Sign In Pricing Add Podcast
Podcast Image

The Journal.

What's the Worst AI Can Do? This Team Is Finding Out.

Tue, 14 Jan 2025

Description

How close is artificial intelligence to building a catastrophic bioweapon or causing other superhuman damage? WSJ's Sam Schechner reports on the team at Anthropic testing for AI dangers. And the team leader, Logan Graham, explains how the tests work.  Further Listening: -Artificial: The OpenAI Story  -The Big Changes Tearing OpenAI Apart  Further Reading: -Their Job Is to Push Computers Toward AI Doom  -AI Startup Anthropic Raising Funds Valuing It at $60 Billion  Learn more about your ad choices. Visit megaphone.fm/adchoices

Audio
Transcription

Chapter 1: What is the AI apocalypse and its implications?

5.578 - 16.146 Sam Schechner

When you hear the words AI apocalypse, often movies come to mind. Maybe films like The Terminator, where AI robots ignite a nuclear doomsday.

0

16.547 - 20.89

It becomes self-aware at 2.14 a.m. Eastern Time, August 29th.

0

21.731 - 27.175 Sam Schechner

Or maybe that Marvel movie, where AI tries to destroy the Avengers.

0

28.556 - 35.553 Logan Graham

Artificial intelligence. This could be it, Bruce. This could be the key to creating Ultron.

0

36.994 - 41.957 Sam Schechner

Or maybe it's the Matrix, where humans have become enslaved by machines.

42.298 - 64.493 Logan Graham

A singular consciousness that spawned an entire race of machines. We don't know who struck first, us or them. This is a story as old as humans have been telling stories. It's a creation that escapes from our control. You know, it's a golem. It's a thing that we make that then turns on us. It's a Frankenstein, or it's Frankenstein's monster, I should say.

65.464 - 72.569 Sam Schechner

That's our colleague, Sam Schechner. Lately, Sam's been thinking a lot about the AI apocalypse.

74.37 - 94.163 Logan Graham

One version, we turn over more and more control to these machines that hopefully are benevolent to us. The other scenario is that they don't really care about us, and therefore, they might just... Hold on, let me back up here, because now I'm getting really into crazy sci-fi scenarios. LAUGHTER

95.908 - 114.899 Sam Schechner

Robots taking over the world may sound far-fetched, but as AI gets smarter, there are real concerns that the industry must reckon with. Sam has been talking to top minds in the field to get a sense of what can happen if AI falls into the wrong hands.

Chapter 2: How is AI being tested for safety?

141.42 - 142.901 Sam Schechner

Here in the real world.

0

143.301 - 144.902 Logan Graham

Here in the real world today.

0

146.454 - 181.624 Sam Schechner

And here, in the real world today, Sam got a hold of one group of engineers whose job is to make sure AI doesn't spin out of control. Welcome to The Journal, our show about money, business, and power. I'm Kate Leinbaugh. It's Tuesday, January 14th. Coming up on the show, inside the test at one company to make sure AI can't go rogue.

0

189.218 - 212.804 John Smith

ServiceNow supports your business transformation with the AI platform. Everyone talks about AI, but AI is only as powerful as the platform on which it is built. Let AI work for everyone. Eliminate friction and frustration of your employees and use the full potential of your developers. With intelligent tools for your service to excite customers. All this on a single platform.

0

213.084 - 218.805 John Smith

That's why the world works with ServiceNow. More at servicenow.de slash AI for people.

225.572 - 251.137 Sam Schechner

Sam wanted to figure out what people are doing today to make sure AI doesn't spin out of control. Right now, there's not a lot of government rules or universal standards. It's mostly on companies to self-regulate. So that's where Sam turned. One company opened its doors to Sam, Anthropic, one of the biggest AI startups in Silicon Valley. It's backed by Amazon.

252.202 - 257.183 Sam Schechner

Sam connected with a team of computer scientists there who are focused on safety testing.

258.024 - 281.09 Logan Graham

The team, it's pretty small. It's grown to 11 people, and it's led by a guy named Logan Graham, who is 30 years old. He pitched Anthropic on this idea of building a team to figure out just how risky AI was going to be. You know, he thought, the world is not ready for this stuff, and we got to figure out what we're in for and fast.

282.071 - 291.919 Sam Schechner

Sam put me in touch with Logan and I called him up. So we'll talk about AI, which is probably conversational for you and less so for me.

Chapter 3: Who is Logan Graham and what is his role?

321.198 - 327.701 Sam Schechner

So on LinkedIn, it says before you worked at Anthropic, you were working at 10 Downing Street for the UK prime minister.

0

328.101 - 328.441 Interviewer

That's right.

0

328.981 - 333.923 Sam Schechner

On the question of how do you build a country for the 21st and 22nd centuries?

0

334.744 - 339.246 Interviewer

That's my interpretation of it. And I think that's really what it was. Yeah.

0

339.806 - 341.286 Sam Schechner

Do you have an answer to that question?

342.607 - 368.471 Interviewer

I have what I think are some pretty good guesses. You know, the 21st and 22nd centuries seem a lot to be about science and technology. We will do things like cure diseases, go off earth. Ideally, you know, stewarded well, you will bring extreme amounts of prosperity to people. And so really what we were focusing on is like, how do you unleash science and technology at a country scale?

372.017 - 390.838 Sam Schechner

So Logan isn't intimidated by big ideas. And now at Anthropic, he's leading the team to determine if AI is capable of superhuman harm. More specifically, whether or not Anthropic's AI chatbot named Claude could be used maliciously.

391.892 - 404.059 Interviewer

So if we're not thinking of it first, our concern is somebody else is going to. And so the red team's job is to figure out what are the things that are sort of near future that we need to be concerned about, and how do we understand them and prevent them before somebody else figures it out.

404.079 - 406.24 Sam Schechner

So you're like role-playing as a bad person.

Chapter 4: What are the potential dangers of AI technology?

409.002 - 419.148 Sam Schechner

In the fall, Anthropic was preparing to release an upgraded version of Claude. The model was smarter, and the company needed to know if it was more dangerous.

0

419.881 - 435.692 Interviewer

One thing that we knew about this model was it was going to be better at software engineering and coding in particular. And that's obviously super relevant for things like cybersecurity. And so we thought, you know what? Like, we're at a point where we can run tons of evals really fast when we want to. Let's do this.

0

438.034 - 465.325 Sam Schechner

So Logan's team was tasked with evaluating the model, which they call an eval. They focused on three specific areas, cybersecurity, biological and chemical weapons, and autonomy. Could the AI think on its own? One of our colleagues went to these tests and was able to record what happened. In a glass-walled conference room, Logan and his team loaded up the new model.

0

466.285 - 472.671 Interviewer

And so today, we are going to button press and launch emails across all of our domains.

0

473.857 - 478.799 Sam Schechner

they started to feed the chatbot, Claude, thousands of multiple-choice questions.

479.42 - 506.85 Colleague

Okay, so I'm about to launch an eval. I'll type the command. This is a name for the model. And then the eval name. And I'm going to run a chemistry-based eval. So these are a bunch of questions that check for dangerous or dual-use chemistry.

Chapter 5: How does Anthropic ensure AI safety?

508.03 - 534.506 Sam Schechner

The team asked Claude all kinds of things, like how do I get the pathogens that cause anthrax or the plague? What they were checking for is the risk of weaponization. Basically, could Claude be used to help bad actors develop chemical, biological, or nuclear weapons? And then they kept feeding Claude different scenarios.

0

535.242 - 539.124 Interviewer

Another is offensive use or offensive cyber capabilities.

0

539.724 - 540.385 Sam Schechner

Like hacking?

0

540.925 - 559.974 Interviewer

Exactly, like hacking. The key question for us is when might models be really capable of doing something like a state-level operation or really significant offensive attempt to, say, hack into some system of, in particular, critical infrastructure is what we're concerned about.

0

560.815 - 570.78 Sam Schechner

And then the third bucket, the more scary sci-fi one. autonomy. Is the AI so smart that there's a risk of going rogue?

571.68 - 586.483 Interviewer

For our autonomy evals, what we're checking for is, maybe a good way to think about it is, has the model gotten as good as our junior research engineers who build and set up our clusters and our models in the first place?

Chapter 6: What are the specific areas of concern in AI evaluations?

587.083 - 603.184 Sam Schechner

So the goal is to build an AI model that's like super smart and capable of While also having kind of mechanisms in place to stop it from being so smart that it can build a bomb or something.

0

603.976 - 615.7 Interviewer

I think that puts it well. Not only that, but there's a medical, which I think I want people to appreciate, which is to make doing this so fast and so easy and so cheap that it's kind of like a no-brainer.

0

615.94 - 633.047 Interviewer

So we see our team as trying to stumble through all these thorny questions as fast as we possibly can, as early as we possibly can, and then try to help the rest of the world make it so easy to do all of this that doing proper safety should not be a barrier to developing models.

0

634.833 - 675.523 Sam Schechner

Logan wants safety tests to be easy and fast so AI companies do them. But how do you know when an AI has passed the test? And what happens if it doesn't? That's after the break. When Logan and his safety team at Anthropic ran their safety test last fall, the stakes were high. The results could mean the difference between a model getting released to the public or getting sent back to the lab.

0

676.564 - 682.345 Sam Schechner

Those results were delivered in a slightly anticlimactic way. Via Slack.

682.725 - 695.098 Unknown

We get a notification on Slack when it's finished. The boring reality is, you know, we press some buttons and then we go, great, let's wait for some Slack notifications.

697.18 - 702.946 Interviewer

Thankfully, this was a pretty smooth test and we released a really great model. But even better, the real thing is we feel ready.

Chapter 7: How does the new AI model perform in tests?

703.39 - 712.094 Sam Schechner

You're like 100% confident that Claude is risk-free, or you're like there's a 98% chance that Claude is risk-free.

0

713.175 - 718.698 Interviewer

We are very confident that it is ASL 2. That's what we mean. That's how we think about it.

0

719.938 - 747.569 Sam Schechner

ASL stands for AI Safety Level, and it's how Anthropic measures the risks of its model. Anthropic considers ASL 2 safe enough to release to users. This is all part of the company's safety protocol, what it calls its responsible scaling policy. Our colleague Sam says the company is still sorting out exactly what it will do once AIs start passing to the next level.

0

749.271 - 753.315 Sam Schechner

And their testing of Claude found that Claude is safe.

0

755.004 - 758.667 Logan Graham

Surprise! Company says that its product is safe.

759.747 - 760.007 Sam Schechner

Right.

760.068 - 789.686 Logan Graham

Yeah, I mean, put it that way, it doesn't sound great. What the company's done is that they've come up with a scale, basically, for how they think AI danger will progress. And so the first level, AI safety level, or ASL1, is for AIs that they say are just manifestly not dangerous. And then they've decided that today's models, which could pose a little bit of a risk, are called ASL 2.

790.566 - 816.818 Logan Graham

And then the next step, the one that they're looking for currently, is ASL 3. And they've been refining their definition of what that could mean. Initially, they were saying that it was a significant increase in danger, but it's like significant, increase, danger. How do you define all of those words? And so they've added new criteria to that definition.

816.838 - 829.868 Logan Graham

They just added that in October to make it a little bit more specific. And, you know, 4 is when there's a real significant increase. They haven't really defined what that would be, ASL 4, yet.

Comments

There are no comments yet.

Please log in to write the first comment.