Decoder with Nilay Patel
Anthropic’s Mike Krieger wants to build AI products that are worth the hype
Mon, 09 Sep 2024
Today, I’m talking with Mike Krieger, the new chief product officer at Anthropic, one of the hottest AI companies in the industry. Anthropic’s main product right now is Claude, the name of both its industry-leading AI model and a chatbot that competes with ChatGPT. Mike has a fascinating resume: he was the cofounder of Instagram, and then started AI-powered newsreader Artifact. I was a fan of Artifact, so I wanted to know more about the decision to shut it down as well as the decision to sell it to Yahoo. And then I wanted to know why Mike decided to join Anthropic and work in AI — an industry with a lot of investment, but very few consumer products to justify it. What’s this all for? Links: Instagram co-founder Mike Krieger is Anthropic’s new chief product officer | The Verge Instagram’s co-founders are shutting down their Artifact news app | The Verge Yahoo resurrects Artifact inside a new AI-powered News app | The Verge Authors sue Anthropic for training AI using pirated books | The Verge The text file that runs the internet | The Verge Anthropic’s crawler is ignoring websites’ anti-AI scraping policies | The Verge Golden Gate Claude | Anthropic Inside the white-hot center of AI doomerism | New York Times Dario Amodei, CEO of Anthropic, on the paradoxes of AI safety | Hard Fork No one’s ready for this | The Verge OpenAI announces SearchGPT, its AI-powered search engine | The Verge Amazon-backed Anthropic rolls out Claude AI for big business | CNBC Transcript: https://www.theverge.com/e/24001603 Credits: Decoder is a production of The Verge and part of the Vox Media Podcast Network. Our producers are Kate Cox and Nick Statt. Our editor is Callie Wright. Our supervising producer is Liam James. The Decoder music is by Breakmaster Cylinder. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Amgen, a leading biotechnology company, needed a global financial company to facilitate funding and acquisition to broaden Amgen's therapeutic reach, expand its pipeline, and accelerate bringing new and innovative medicines to patients in need globally.
They found that partner in Citi, whose seamlessly connected banking, markets, and services businesses can advise, finance, and close deals around the world. Learn more at citi.com slash client stories.
Do you want to be a more empowered citizen but don't know where to start? It's time to sharpen your civic vision and ignite the spark for a brighter future. I'm Mila Atmos, and on my weekly podcast, Future Hindsight, I bring you conversations to translate today's most urgent issues into clear, actionable ways to make impact.
With so much at stake in our democracy, join us at futurehindsight.com or wherever you listen to podcasts.
Hello and welcome to Decoder. I'm Nilay Patel, editor-in-chief of The Verge, and Decoder is my show about big ideas and other problems. Today I'm talking with Mike Krieger, the new chief product officer at Anthropic, one of the hottest AI companies in the entire industry.
Anthropic was started in 2021 by former OpenAI executives and researchers who wanted to build a more safety-minded AI company, which I have to point out is a real theme among people who leave OpenAI. Something to think about. Anthropic's main product right now is Claude, which is the name of both its industry-leading AI model and a chatbot that competes with ChatGPT.
Like other major AI companies, Anthropic has billions in funding from some of the biggest names in tech, primarily Amazon. But at the same time, Anthropic does have a distinct and intense safety culture. The company is notable for employing some people who legitimately worry that AI might destroy mankind. And I wanted to ask Mike how that tension plays out in product design.
On top of that, Mike has a pretty fascinating history. If you're a longtime tech fan, you likely know him as the co-founder of Instagram. a company he started with Kevin Systrom before selling it to Facebook, now Meta, for a billion dollars back in 2012. That was an eye-popping amount of money back then, and the deal turned Mike into founder royalty basically overnight.
Mike left Meta in 2018, and a few years later he started to dabble in AI, but not quite the type of AI we talk about all the time on Decoder. Instead, Mike and Kevin launched Artifact, an AI-powered newsreader that did some very interesting things with recommendation algorithms and aggregation.
I was a big fan of Artifact, but ultimately it didn't take off like anyone wanted, and Mike and Kevin shut it down earlier this year. They sold the underlying tech to Yahoo. We talk a lot about decisions here on Decoder, so I wanted to know more about the decision to shut Artifact down, and then the decision to sell it to Yahoo.
And then, of course, I wanted to know why Mike decided to join Anthropic and work in AI. An industry with a lot of investment, but very few consumer products to justify it. Really, what is all of this for? What products does Mike see in the future that make all of the turmoil around AI worth it? How is he thinking about building them?
I've always enjoyed talking product with Mike, and this conversation is no different, even if I'm still not really sure anyone's described what the future is going to look like. Okay, Anthropic Chief Product Officer Mike Krieger, here we go. Mike Krieger, you are the new chief product officer at Anthropic. Welcome to Decoder. Thank you so much. It's great to be here. Great to see you.
Yeah, I'm excited to talk to you about products. The last time I talked to you, I was trying to convince you to come to the Code Conference. I didn't actually get to interview you at Code, but I was trying to convince you to come. And I was like, I just want to talk about products with someone as opposed to regulation. And you're like, yes, here's my product.
I warn the audience, we're definitely going to talk a little bit about AI regulation. It's going to happen. It seems part of the puzzle. But you're building the actual products. And I have a lot of questions about what those products could be, what the products are now, where they're going.
But I want to start sort of at the beginning of your Anthropic story, which is also the end of your Artifact story. So people know you. You started Instagram. You were at Meta for a while. You left Meta. And then you and Kevin Systrom started Artifact, which was a really fun newsreader. It had some really interesting ideas about how to surface the web and have comments and all that.
And then you decided to shut it down. I think of the show as a show for builders and we don't often talk about shutting things down. Walk me through that because it's as important as starting things up sometimes.
Yeah, it really is. And the feedback we've gotten post shutdown for Artifact was, you know, it's a mixture of sadness, but also kudos for calling it when you saw it. And I think that there's value to also having a moment where you say, you know, we've seen enough here. I think for us, it was the product I still love and miss. And in fact, like I will run into people and I'll be like,
I would expect them to say, I love Instagram. They're always like, I love Anthropic. Like, Artifact. I really miss Artifact. So we had a resonance with a too small but very passionate group of folks. But we'd been working on it kind of in the kind of full run of it about three years. And the product had been out for a year.
And we were looking at the metrics, looking at growth, looking at what we had done. And we kind of had a moment where we said, are there ideas or kind of product directions that we'll feel dumb not having tried before calling it? And we had a list of those, and that was kind of mid last year.
And we basically took the rest of the year to work through those, got to the end of the year and said, yeah, those moved the needle a little bit, but not enough to convince us that this was really on track to be something that the team and we were collectively going to spend a lot of time on over the coming years. And that was the right moment to say, all right, let's pause, let's step back.
this the right time to shut it down and the answer was yes it's actually if you haven't seen the yahoo yahoo basically bought it took all the code and redid yahoo news as artifact or the other way around and it's it's very funny like you'll you'll have a little bit of like a bizarre world moment the first thing like this is almost exactly like artifact a little bit more purple some different sources but yeah it was it was definitely the right decision and like
You know it's a good decision when you step back. And the thing you regret is it didn't work out, not that you had to make that decision or that you made that exact decision at the time that you did.
There are two things about Artifact I want to ask about. And I definitely want to ask about what it's like to sell something to Yahoo in 2024, which is unusual. It's not a thing that's been happening a lot. The first is that Artifact was very much designed to surface web pages. It was predicated on a very rich web.
And if there's one thing I'm worried about in the age of AI is that the web is getting less rich, right? More and more things are moving to closed platforms for more creators. They want to start something new. They end up on a YouTube or a TikTok or I don't know if there's like dedicated threads creators yet, but they're coming.
And it seemed like that product was chasing a dream that might be under pressure from AI specifically, but also just like the rise of creator platforms more broadly. Was that a real problem or is that just something I saw from the outside?
I would agree with the assessment, maybe different root causes. I think what we saw, some sites were able to balance kind of a mix of subscription, tasteful ads, good content. I would put The Verge at the top of that list. I'm not just saying that. I'm talking to you. Legitimately, every time we linked to a Verge story from Artifact, somebody clicked through.
It was like, this is a good experience. It feels like things are in balance. At the extremes, though, like local news, a lot of those websites for economic reasons have become sort of like you arrive, there's a sign in with Google before you've even read a single thing, a pop up to sign up to the newsletter. It's like before you've even consumed any content.
I think that's like probably a longer run economic question of supporting local news, probably more so than AI, at least like that trend seems like it's been happening for quite a bit. The creator piece is also really interesting where, you know, if you look at where things that are breaking news or at least like emerging stories are happening, they're often happening.
It's an ex-post that went viral. And what we would often get on Artifact is the summary roundup of the reactions to the thing that happened yesterday, which if you're relying on that, you're a little bit out of the loop already. And so I think when I look at where things are happening and where the conversation is happening,
at least for the the kind of cultural kind of core piece of that conversation it's often not happening anymore on media properties it is starting somewhere else and then getting aggregated elsewhere and i think that just has a implication on a site or a product like artifact and how well you're ever going to feel like this is breaking news so over time we moved more to let's be more interest-based which
Funny enough, Instagram, its heart was also very interest-based. Less breaking news. But can you have a product that is just that? I think that was the struggle.
You said media properties. Some media properties have apps. Some are expressed only as newsletters. But I think what I'm asking about is the web. This is just me doing therapy about the web. What I'm worried about is the web, right? The creators aren't on the web. We're not making websites and Artifact was predicated on there being a rich web.
Search products in general are sort of predicated on there being a rich and searchable web that will deliver good answers. To some extent, AI products require there to be a new web because that's where we're training all our models. Did you see that? That, okay, this promise of the web is kind of under pressure.
If all the new stuff is breaking on a closed platform, you can't search like a TikTok or an X or something else, or you can't index or surfacing old tweets is not really a great user experience. Actually building products on the web might be getting more constrained and not a good idea anymore. Yeah.
Yeah, even citing newsletters is a great example where some of the best stuff that I read is, you know, sometimes there's a, you know, equivalent sub stack site that you could go look at. And some of the newsletters exist purely in email. We even set up an email account that just ingested newsletters to try to surface them, at least links from them.
And it was, you know, the designing experience is not there. I'd say that the thing I noticed on the open web in general, and like as a longtime fan of the web, something that was
somebody who was very online before being online was like a thing that people were back in Brazil, like as a, as a, you know, preteen in a lot of ways, the incentives that have been set up around like, well, you know, a recipe won't rank highly if it's just the recipe, let's tell the story about the life that happened up to leading to that recipe.
Like those trends, I feel like have been happening for a while and already leading to a place where the end consumer might be a user, but it is being intermediated already through be it, you know, a search engine and optimized for that findability or optimized for, or what's going to get shared a bunch or what's going to get, you know, the most attention and like that flow.
I mean, newsletters are, and podcasts are two ways that have probably most successfully broken through that.
And I think that's been an interesting direction, but in general, I feel like there's been probably a decade long sort of at risk for the open web in general, in terms of like, what is the actual intermediation that's happening between like, I am trying to tell a story or I'm trying to talk to somebody.
And somebody is receiving that story and all the roadblocks along the way, they just make that more and more painful. And it's no surprise then that, hey, I can actually just open my email and get the content. That feels better in some ways, although also not great in a bunch of other ways. That's how I've watched it. And I would call it not a healthy place where it is now.
Yeah, the way that we talk about that thesis on Decoder most often is that people build media products for the distribution. And so podcasts famously have open distribution. It's just an RSS feed. Well, it's like an RSS feed, but there's like Spotify's ad server in the middle. I'm sorry to everybody who gets whatever ads that we put in here. But it's core. It's an RSS product.
Newsletter is still at its core an IMAP product, an open mail protocol product. The web is like search distribution. So we've optimized to one thing. And the reason I'm asking this, and I'm going to come back to this theme a few times, it felt like Artifact was trying to build a new kind of distribution.
But the product it was trying to distribute was web pages, which were already overtly optimized for something else.
I think that's a really interesting assessment. Funny watching the Yahoo version of it because they've done the content deals to get the more slimmed down pages.
And though they have fewer content or fewer content sources, the experience of tapping on each individual story, I think, is a lot better because those have been now formatted for a distribution that is, I guess, linked to some paid acquisition, which is different than what we were doing, which was like, here's the open web. We'll give you warts and all and link directly to you.
But I think your assessment feels right.
Okay, so that's one. I want to come back to that theme. I really wanted to start with Artifact in that way because it feels like you had an experience in one version of the Internet that is maybe under pressure. The other thing I wanted to ask about Artifact, you and Kevin, your co-founder, both once told me that you had big ideas, like scale ideas for Artifact.
And you had this big idea, and you wouldn't tell me what it was. It's over now. What was it?
For us, it was, I mean, two things that I remained sad that we didn't get to see through. One was the idea of good recommender systems underlying multiple product verticals. So news stories being one of them, but I had the belief that maybe somebody will build out that.
If you understand yourself well through, or the system can understand you well through how you're interacting with news stories, how you're interacting with content, then is there another vertical that could be interesting? Is it around shopping? Is it around local discovery? Is it around people discovery, all these different places?
Because for all the promise, and I'll separate maybe machine learning and AI, and I realize that's a shifting definition throughout the years. Let's call for the purpose of our conversation, recommender system machine learning systems. For all their promise, my day-to-day is actually not filled with too many good instances of that product.
The big company idea was, can we bring Instagram type product thinking to recommender systems and combine those two things in a way that creates new experiences that aren't beholden to your existing friend and follow graph, with news being an interesting place to start.
you highlighted some good problems about the content, but the appealing part was we're not trying to solve the two-sided marketplace all at once. It turns out half that marketplace was already search-pilled and had its own problems, but at least there was the other side as well. The other piece, even within news, is really thinking about how do you eventually open this up?
I think Substack is pursuing this from a very different direction, but open this up so creators can actually be writing content and understanding distribution natively on the platform. I've I feel like every platform eventually wants to get to this as well.
When you watch the closest analogs in China like Toutiao, they started very much like crawl the web, have these eventual publisher deals, and now it is, I would guess, 80-90 percent first-party content. There's economic reasons why that's nice.
And some people make their living writing articles about local news stories on Totiao, including one of our engineers, I think a sister or close family member. But the other side of it is that content can just be so much more optimized for what you're doing.
Actually, at Code, I met an entrepreneur who was creating a new novel media experience that was very, if stories met news, met mobile, what would it be for most news stories? And I think for something like that to succeed, it also needs distribution that has that as the native distribution type.
So recommendation systems for everything, and then primarily recommendation-based first-party content writing platforms. The two ideas are like, oh, one day for somebody.
All right, last artifact question. You shut it down, and then there was a wave of interest. And then I think publicly, one of you said, oh, there's a wave of interest. We might flip it. And then it was Yahoo. Tell me about that process.
I think there were a few things that we wanted to align. Like, I think we'd worked in that space for long enough that whatever we did, we sort of wanted to kind of tie a bow around it and move on to whatever it was next. And so that was one piece. And the other piece was, like, I wanted to see the ideas live on in some way. So, like...
there was kind of a lot of conversations around like, well, like what would it become under like different conversations? And the Yahoo one was really interesting. And I would admit to being like pretty unaware of what they were doing beyond like, I was still using Yahoo Finance and like my fantasy football league. Um, but beyond that, it was like not familiar what they were doing.
They were like, no, no, we want to take it. And we think in two months we can relaunch it as Yahoo news. And I was thinking like, that sounds pretty crazy. Like that's a very short timeline for like, and a code base you're not familiar with. And they had our access to us and like, we were basically like helping them out almost full time, but that's still a lot.
And, uh, they actually basically pulled it off. I think it was 10 weeks instead of eight, eight weeks. But I think there is like a newfound energy in there to be like, all right, like what are the properties we want to build back up again and and do it. So I fully admit coming in with a bit of a bias. Like, I don't know what's left at Yahoo. Like, what's going to happen here?
And then the tech team's actually bit into it with the open mouth. It's kind of a gross metaphor. But they went all in. And then they got it shipped. And I'll routinely text. So Justin was our Android lead. He's at Anthropic now. He actually came here before I did. And I'll find little details. And he's like, oh, they kept.
I spent a lot of time with this 3D spinning animation when you got to a new reading level. It's way too much time when this like, beautiful reflection, specular highlighting thing. It's probably misprioritized that week, but they kept it. But now it goes, yeah, when you do it, I was like, that's pretty on brand. It was a really fascinating experience, but it gets to live on.
And I think it will probably have a very different future than what we were envisioning. But I think some of the core ideas are there on like, hey, what would it mean to actually try to create a personalized news system that was really decoupled from any kind of existing follow graph or what you were seeing already on something like Facebook.
Were they the best bidder? Was the decision Yahoo will deploy this to the most people at scale? Was it they're offering us the most money? How did you choose?
It was this optimization function. I would say the three variables were like, like deal was attractive or attractive enough. Our personal commitments post, you know, transition were, were pretty light, which I liked. And they had reached like Yahoo news because like a hundred million monthly still. So it was like reach minimal commitment, but enough that we felt like it could be successful.
And then like, they were like in the right space, at least on the bid size.
This sounds like the dream. You can just have this. I'm going to walk away. It's a bunch of money. Okay.
Yeah.
Makes sense. I was just wondering if that was it or whether it was like, it wasn't as much money, but they had the biggest platform and you wanted to, because Yahoo is deceptively still huge.
Yeah, Deceptively is still huge. I think under new leadership and with a lot of excitement there. And, you know, for me, it was it really changed. It was not like a huge exit or like I would not call it a super successful outcome.
But the fact that I feel like that chapter closed in a nice way and then we could like move on without like wondering if we should have done something different when we closed it. Like it just I slept much better at night. The Q1 of this year because of it.
We need to take a quick break. We'll be right back.
Fox Creative. This is advertiser content from Zelle. When you picture an online scammer, what do you see?
For the longest time, we have these images of somebody sitting crouched over their computer with a hoodie on, just kind of typing away in the middle of the night. And honestly, that's not what it is anymore.
That's Ian Mitchell, a banker turned fraud fighter. These days, online scams look more like crime syndicates than individual con artists. And they're making bank. Last year, scammers made off with more than $10 billion.
It's mind-blowing to see the kind of infrastructure that's been built to facilitate scamming at scale. There are hundreds, if not thousands, of scam centers all around the world. These are very savvy business people. These are organized criminal rings. And so once we understand the magnitude of this problem, we can protect people better.
One challenge that fraud fighters like Ian face is that scam victims sometimes feel too ashamed to discuss what happened to them. But Ian says one of our best defenses is simple.
We need to talk to each other. We need to have those awkward conversations around what do you do if you have text messages you don't recognize? What do you do if you start getting asked to send information that's more sensitive? Even my own father fell victim to a—thank goodness— a smaller dollar scam, but he fell victim and we have these conversations all the time.
So we are all at risk and we all need to work together to protect each other.
Learn more about how to protect yourself at vox.com slash Zelle. And when using digital payment platforms, remember to only send money to people you know and trust.
They're not writers, but they help their clients shape their businesses' financial stories. They're not an airline, but their network connects global businesses in nearly 180 local markets. They're not detectives, but they work across businesses to uncover new financial opportunities for their clients. They're not just any bank. They are Citi. Learn more at Citi.com slash WeAreCiti.
That's C-I-T-I dot com slash WeAreCiti.
Support for this episode comes from Microsoft. Did you know one in 43 U.S. children have had their personal information exposed or compromised? Scammers are targeting our kids online, especially on social media, where unmonitored conversations can easily lead to identity theft. We need better tools to protect our loved ones to stay ahead.
Thankfully, there's Microsoft Defender, all-in-one protection that can help keep our families safe when they're online. Microsoft Defender makes it easy to safeguard your family's data, identities, and privacy with a single security app across your devices. Take control of your family's security by helping to protect their personal info, computers, and phones from hackers and scammers.
Visit Microsoft365.com slash Defender.
We're back with Anthropic Chief Product Officer Mike Krieger. All right, so that's that chapter. The next chapter is you show up as the Chief Product Officer at Anthropic. What was that conversation like? Because in terms of big commitments, hairy problems, are we going to destroy the web? It's all right there. Maybe it's a lot more work. How did you make the decision to go to Anthropic?
Well, the top level decision was what to do next at all. I admit to having a bit of like an identity crisis at the beginning of the year of like, I only really know how to start companies and actually more specific. I probably only know how to start companies with Kevin. Like we make a very good co-founder pair. And I was looking at it like, what are the aspects of that that I like?
I like knowing the team from day one. I like having a lot of autonomy. I like having partners that I really trust. I like working on big problems with a lot of like open space. And at the same time, I was like, I do not want to start another company right now. Just went through the ringer on that three years. Did an okay outcome. It wasn't the outcome we wanted.
And I was sat there going, I want to work on interesting problems at scale at a company that I started, but I don't want to start a company. I kind of swirled a bit. I was like, what do I do next? And I definitely knew I did not want to just invest. Not that investing is a just thing, but it's just different. I'm like a builder at heart, as you all know. And so I was like,
this is going to be really hard. Maybe I need to take some time and then start a company. And then I got introduced to the Anthropic folks via the head of design here, who's somebody I actually built my very, very first iPhone app with in college. So I've known him for a long time. His name is Joel. I started talking to them, and I realized they
research team here is incredible, but the product efforts were so nascent that, you know, I'm not going to kid myself that I'm coming in as a co-founder, like the company's around for a couple of years. They were like already like sort of company values and the way things are working and they call themselves ants. Maybe I would have advocated for a different like employee nickname, but it's fine.
That ship has sailed. But I felt like there was a lot of product greenfield here and a lot of things to be done and built. So there was that comment. It was the closest combination I could have imagined to
the team I would have wanted to have built had I been starting a company, enough to do, so much to do that I wake up every day both excited and daunted by how much there is to do, and already momentum and scale so I could feel like I was going to hit the ground running on something that had a bit of tailwinds, where I felt like a lot of Artifact were headwinds somewhat outside of our control, and that was the combination.
The first one was big decision, what do I do next? Then the second one was like, all right, Is Anthropic the right place for it? And it was the sort of thing where every single conversation I'd have with them, I'd go back to them and be like, I think this could be it. I wasn't thinking about joining a company that's already running like crazy, but I want to be close to the core AI tech.
I want to be working on interesting problems. I want to be building, but I want it to feel like as close-ish to a co-founder kind of situation as I could.
And I think Daniela, who's the president here, maybe was trying to sell me, but she's like, you feel like the eighth co-founder that we never had that was like our product co-founder, which is amazing that they have seven co-founders and none of them are like the product co-founder. But whatever it was, it sold me. And I was like, all right, I'm going to jump back in.
I'm excited for the inevitable Beatles documentaries about how you're the fifth Beatle, and we can argue about that forever.
The Pete Best event. I hope not. At least the Ringo that comes in later.
In 2024 with our audience as young as it is, that might be a deep cut, but I encourage everybody to go search for Pete Best and how much of an argument that is. Let me ask you just two big picture questions about working in AI generally. You started Instagram. You're deep with creatives. You built a platform of creatives. You care about design, obviously.
With that community, AI is a moral dilemma. People are upset about it. I'm sure they will be upset that I even talked to you. We had the CEO of Adobe on to talk about Firefly, and that is some of the most upset emails we've ever gotten. How did you evaluate that? I'm going to go work in this technology that is built on training against all this stuff on the internet.
And people have really, really hot emotions about that. And there's a lot, we've got to talk about lawsuits. There's lawsuits. There's copyright lawsuits. How are you thinking about that?
I have some of these conversations with my good friends as a musician down in LA. And like, you know, he comes up to the bear and he's on tour and we'll have like,
one hour deep conversation over pupusas over like what is you know ai and music and how do these things connect and where do these things go and i think it was those interesting insights on like what parts of the creative process or which pieces of creative output are most affected right now and then you can kind of play that and see how that's going to change i think that question is a big part about why i ended up in the anthropic if i was going to be an ai i think a couple of things like
obviously the written word is really important and like there's so much that happens in text.
I definitely do not mean to make this sound like text is less creative than others, but the fact that they've chosen, I guess we've chosen to really focus on text and image understanding and like keep it to text out and text out that is like supposed to be, you know, something that is like tailored to you rather than reproducing something that's already out there, I think reduces some of that space significantly where you're not also trying to produce like
Hollywood type videos or, you know, high fidelity images or sounds and music. And some of that I think is a research focus. Some of that's a product focus, but I think that also the space of thorny questions is still there, but also a bit more limited in those domains or outside of those domains and more purely on text and code and those kinds of expressions.
So that was a strong sort of contributor to me wanting to be here versus other spots.
There's so much controversy about where the training data comes from. Where does Anthropix training data for Cloud come from? Are you scraping the web like everybody else?
Scraping the web. We respect robots.txt. We have a few other data sources that we license and work with folks kind of separately for that. Let's say the majority of it is web crawl done in a web crawl respectful way.
Were you respecting robots.txt before everyone realized that you had to start respecting robots.txt?
We were respecting robots.txt beforehand. And then in the cases where, for whatever reason, it wasn't getting picked up correctly, we've since corrected that as well.
What about YouTube, Instagram?
Are you scraping those sites? Yeah, no. When I think about the players in this space, there are times where I'm like, oh, it must be nice being inside Met. I don't actually know if they trade on Instagram content or if they talk about that, but there's a lot of good stuff in there. And same with YouTube. A close friend of mine is at YouTube.
That's like the repository of collective knowledge of how to fix any dishwasher in the world. And people ask that kind of stuff. So we'll see over time what those end up looking like.
You don't have a spare key to the metadata center, the Instagram servers?
I knocked it on the way out.
When you think about that general dynamic, there's a lot of creatives out there who perceive the IATB a risk to their jobs or perceive that there's been a big taking. I'll just ask about the lawsuit. There's a lawsuit against Anthropic. It's a bunch of authors who say that the model, that Claude is legally trained against their books. Do you think there's a product answer to this?
It's going to lead into my second question, but I'll just ask broadly. Do you think you can make a product so good that people overcome these objections? Because that is kind of the vague argument I hear from the industry, right?
Like right now we're seeing a bunch of chatbots and you can make the chatbot fire off a bunch of copyright information, but there's going to come a turn when that goes away because the product will be so good and so useful that people will think it has been worth it. And I don't see that yet. I think that's a lot of the heart of –
the copyright lawsuits, beyond just the legal pieces of it, is that the tools are not so useful that anyone can see that the trade is worth it. Do you think there's going to be a product where it is obvious that the trade is worth it?
I think it's very use case dependent. And like what the kind of question that we drove our Instagram team insane with is we would always ask them like, well, what problem are you solving? And like, general like text box interface that can answer any question is like a technology and like the beginnings of a product, but it's not like a precise problem that you are solving.
And I think grounding yourself in that maybe helps you get to that answer, which is like if what you're trying to build, for example, like I use cloud all the time for like code assistance, like that is solving a direct problem, which is I'm trying to like ramp up on product management here and like get our products like underway and like also like work on a bunch of different things.
And to the extent that I have any time to be in like pure build mode, I want to be really efficient on it. Like that very directly connected problem and like total game changer, just see myself as a builder. And like, it lets me focus on different pieces as well. I'm talking to somebody right before this call as well.
Like they are now using cloud a bunch to soften up or otherwise change their like long missives on Slack before they send them. And so like this like human editor kind of piece. Like, that solves their, like, kind of immediate problem. Maybe they need to, like, tone down and, like, take a, you know, chill out a little bit before sending a Slack.
But, like, I think, again, grounding it in use, because that's what I'm trying to really focus on our products here. It's, like, if you try to boil the ocean, I think you end up in actually really adjacent to these kinds of, like, most ethical questions that you raise, which is, like, if you're an anything box, then, like, everything is, like, you know, potentially complicated.
either under threat or adjacent or problematic. I think there's real value to saying, all right, what are the things we want to be known to be good for? I'd argue today that the product actually does serve some of those well enough that it's like, I'm happy it exists and I think folks are in general.
Then I think over time, if you look at things like writing assistance more broadly for novel length writing, I think the jury's still out on that. My wife was doing a prototype version of that. I've talked to other folks. I think our models are quite good, but they're not great at keeping track of characters over book length pieces or reproducing particular things.
I would ground that in, what can we be good at now? Then let's, as we move into new use cases, navigate those carefully in terms of who is actually using it and are we providing value to the right folks in that exchange?
Let me ground that question in a more specific example. both in order to ask you a more specific question and also to calm the people who are already drafting me angry emails. TikTok exists. TikTok is maybe the purest garden of innovative copyright infringement that the world has ever created.
I've watched entire movies on TikTok, and it's just because people have found ways to bypass their content filters. I do not perceive the same outrage at TikTok for copyright infringement as I do with AI. Maybe there's someone who's really mad. I watched entire like 1980s episodes of This Old House on TikTok accounts that are literally labeled like best of old This Old House.
I don't think Bob Vila is getting royalties for that. But it seems to be fine because TikTok as a whole has so much utility and people perceive even the utility of watching like old 1980s episodes of This Old House.
And there's something about that dynamic between this platform is going to be loaded full of other people's work and we're going to get value of it that seems to be rooted in the fact that mostly I'm looking at the actual work. I'm not looking at some 15th derivative of this whole house as expressed by an AI chatbot. I'm actually just looking at a 1980s version of this whole house.
Do you think that AI chatbots can ever get to a place where it feels like that? Where I'm actually just looking at the work or I'm providing my attention or time or money to the actual person who made the underlying work as opposed to we trained it on the open internet and now we're charging you 20 bucks and the... 15 steps back, that person gets nothing.
Yeah. I think to ground in the TikTok example as well, I think there's also an aspect where if you imagine the future of TikTok, probably most people say, well, maybe they'll add more features and I'll use it even more. I don't even know what the average time spent is. It definitely eclipsed whatever we ever had on Instagram.
That's the end of the economy.
Yeah, right. Exactly. Full-on TikTok. Build AGI, create universal prosperity so we can spend time on TikTok would not be my preferred future outcome, but I guess you could construct that if you wanted to. But I think the future feels, I would argue, a bit more knowable in the TikTok use case. And I think in the
In the AI use case, it's a bit more like, well, where does this accelerate, you know, to and where does this eventually complement me? Where does it supersede me? And I think I would posit that a lot of the sort of AI related anxiety can be tied to that sort of like the fact that like three or four years ago, this technology was radically different.
Three or four years ago, like TikTok existed and it was already there. you know, kind of on that trajectory. And even if it wasn't there, you could kind of have imagined it from where YouTube and Instagram were. And if they had like an interesting baby with Vine, like it might've created TikTok. So I think it is partially because the platform is so entertaining. I think that's a piece.
I think the like connection to real people is an interesting one. And I'd like, I'd love to like spend more time on that one. So I think that's an interesting kind of piece of the AI ecosystem. And then the last piece is just like the knowability of where it goes or like probably the three that it grounds it more.
Anthropic started, it was probably the original, we're all quitting OpenAI to build a safer AI company. Now there's a lot of them. My friend Casey makes a joke that every week someone quits to start yet another safer AI company.
Matt Levine has a great, it's a universal sorting function where it's actually just going to distill it on either side.
Is that expressed in the company? I mean, obviously, Instagram had big moderation policies. You thought about it a lot. It is not perfect as a platform or a company, but it's certainly at the core of the platform. Is that at the core of Anthropic in the same way that there are things you will not do?
Yeah, deeply. And I saw it in a week, too. So I'm a ship-oriented person. Even with Instagram, early days, it was like, let's not get bogged down in building the 50 features, let's build the two things well and get it out as soon as possible. Some of those decisions to ship a week earlier and not have every feature, I think were actually existential to the company. I feel that in my bones.
Week two I was here, our research team put out a paper on interpretability of our models and kind of buried in the paper was this idea that they found a feature inside one of the models that if amplified would make Claude believe it was the Golden Gate Bridge.
Not just like kind of believe it, like prompt it, like, hey, you're the Golden Gate Bridge, but like deeply, like in the way that my five-year-old- We'll make everything about turtles. It made everything about the Golden Gate Bridge. How are you today? I'm feeling great. I'm feeling international orange. And I'm feeling in the foggy cloud of San Francisco.
And somebody in our Slack was like, hey, should we build and release Golden Gate Cloud? It was almost like an offhand comment. And a few of us were like, Absolutely, yes. For two reasons. One, this was actually quite fun. But two, getting people to actually have some firsthand contact with what a model that has had some of its parameters tuned, we thought was valuable.
So from that IRC message to having GoldenGate clawed out on the website was, I think, basically 24 hours. And in that time, we had to do some product engineering, some model work. But we also ran through a whole battery of safety evals. And I think that was just an interesting piece where you can move quickly, and not every time can you do only a 24-hour safety valve.
There's lengthier ones for new models. This one was a derivative, so it was easier. But the fact that that wasn't even a question like, wait, should we run safety valves? No, absolutely. That's what we do before we launch models. We make sure that it's both safe from the things that we know about, and let's also model out what are some novel harms.
The bridge is unfortunately associated with suicides. Like, let's make sure that the model doesn't guide people in that direction. And if it does, let's put in the right safeguards. So that's kind of a, like, trivial example because it was like an Easter egg we shipped for basically two days and then wound down. But it was, like, very much at its core there.
Even as we prepare model launches, again, I have the urgency, like, let's get it out. Like, I want to see people use it. And then you, like, actually do the timeline. We're like, well, from the point where the model is ready to the point where it's released, like, there are things that we are going to want to do to make sure that we're in line with our responsible scaling policy.
What I appreciate about the product and the research teams here is that it's not seen as standing in our way. It's like, yeah, that's why this company exists. I don't know if I should share this, but I'll share it anyway. At our second all-in since I was here,
Somebody who's very early here stood up and was like, if we succeeded at our mission, but the company failed, I would see this as a good outcome. I don't think you would hear that. You definitely would not hear that at Instagram. Not because we were bad people, but if we succeeded in helping people see the world in a more beautiful visual way, but the company failed, I would be super bummed.
I think a lot of people here would be very bummed too, but that ethos is quite unique.
I think this brings me to the decoder questions. Anthropic is what's called a public benefit corporation. There's a trust underlying it. You are the first head of product. You've described the product and research teams as being different than there's a safety culture. How does that all work? How is Anthropic structured?
Broadly, we have our research teams. We have the team that sits most closely between research and product, which is a team thinking about inference and model delivery and everything that it takes to actually serve these models, because that ends up being the most complex part in a lot of cases. And then we have product.
I would say if you just sliced off the product team, it would look similar to product teams that You know, most tech companies with a couple of tweaks. One is we have a labs team, and the purpose of that team is to basically stick them in as early in the research process as possible with designers and engineers to start prototyping at the source rather than wait until research is done.
I think I can go into why I think that's a good idea. That's the team that got spun up right after I joined. And then the other team we have is our research PM teams, because ultimately we're delivering the models using these different services. And the models have capabilities like what they can see well, you know, in terms of multimodal or
what type of text they understand, even understanding what languages are they need to be good at. Having end user feedback tie in all the way back to research ends up being very important. And it prevents it from ever becoming this like, almost like ivory tower. Like we built this model and it's like, is it actually useful? Like we say we're good at code. Are we really?
How are startups that are using it for code giving us feedback on, oh, it's good at like these Python use cases. It's not good at this autonomous thing. Great. Like, That's feedback that's going to channel right back in. So those are the two distinct pieces.
But within product, and I guess a click down, because I know you get really interested on Decoder, on team structures, we have apps, which is Cloud AI, Cloud for Work. And we have developers, which is the API. And then we have our Kuki Labs team.
And is that all just, that's the product side. The research side, is that the side that works on the actual models?
Yeah, that's aside on the actual models. And that's everything from researching model architectures, figuring out how these models scale, and then a strong red teaming safety alignment team as well. And that's another component that is deeply in research. And I think some of the best researchers end up gravitating towards that as they see that's the most important thing they could work on.
How big is Anthropic? How many people?
We're north of 700 at last count.
And what's the split between that research function and the product function?
Product is really, I think, probably, I wouldn't say doubled, but almost doubled. Product is just north of 100. So the rest is, you know, everything between. We have sales as well, but research, like the fine-tuning part of research, inference, and then the safety and scaling pieces as well. So I described this within a month of joining as those crabs that have one super big claw.
We're really big on research, and product is this very small claw still. Or the other metaphor I've been using is you're a teenager. Some of your limbs have grown faster than others, and some are still catching up. The kind of... crazier bet is I would love for us to not have to then double the product team.
I'd love for us instead to find ways of using Cloud to make us more effective at everything we do on products so that we don't have to double. Because every team struggles with this, so this is not a novel observation. But I look back at Instagram, When I left, we were 500 engineers. Were we more productive than at 250? Almost certainly not.
Were we more productive than at like 125 to 250 marginally? You know, I had a really depressing interview once when I was trying to hire a VP of Engine. I was like, how do you think about developer efficiency and like team growth?
And he's like, well, if every single person I hire is at least like net contributing, like something that's succeeding, even if it's like sub number, like a one to one ratio or before, I'm like, That's depressing. And I think it creates all this other swirl around just team culture dilution, et cetera. So that's something I'm personally passionate about.
How do we take what we know about how these models work and actually make it so the team can stay smaller and more tight-knit?
Yeah. Tony Fidel, who did the iPod, he's been on Dakota before, but when we were starting The Verge, he was basically like, look, you're going to go from, I don't remember what the actual numbers were, but he said something like, you're going to go from like 15 or 20 people to 50 or 100, and then nothing will ever be the same.
And I've thought about that every day since, because we're always right in the middle of that range. And I'm like, when is the tipping point? Where does moderation live in this structure? You mentioned safety on the model side, but you're out in the market building products. You've got what sounds like a very horny golden great bridge people can talk to. And you're running tests there.
Sorry, that's just my every conversation has one joke about how horny the models are. Where does moderation live, right? At Instagram, there's the big centralized meta trust and safety function. At YouTube, it's in the product org under Neil Mohan there. Where does it live for you?
I would probably put it in three places. One is in the actual model training and fine-tuning, where part of what we do on the reinforcement learning side is saying we'd like to find a constitution for how we think Cloud should be in the world. And that gets baked into the model itself early, before you hit the system prompt, before people are interacting with it.
That's getting encoded into that around how should it behave? Should it be willing to answer and be willing to chime in on where should it not be. And that's, I think, very linked to the responsible scaling piece. Then next is in the actual system prompt. So we actually, in the spirit of transparency, just started publishing our system prompts.
People would always figure out clever ways to try to reverse them anyway, and we're like, if that's going to happen, why don't we just actually treat it like a changelog? So just be transparent. So it's, I think this is last week, you can go online and you actually see what we've changed.
That's another place where there's like additional kind of guidance that we give to the model around how it should act. Of course, like ideally it gets baked in earlier. People can always find ways to try to get around it, but with We're fairly good at preventing jailbreaks. And then the last piece is where our trust and safety team sits.
And that's, you know, the trust and safety team is the closest team. At Instagram, we called it at one point trust and safety, another point well-being, but that same kind of like last mile remediation piece. And I would kind of bucket that work into two pieces. One is what are people doing with Claude and publishing out to the world? So with Artifact's
It was like the first product we had that any amount of social thing at all, which is like you could create an article, in fact, hit share, and actually put that on the web. And... That's a very common problem in shared content. I lived shared content for almost 10 years at Instagram, and here it was like, wait, do people have usernames? How do they get reported?
We delayed that launch for a week and a half to make sure we had the right TNS, trust and safety pieces around moderation, reporting, cues around taking it down, limited distribution, figuring out what it means for the people on Teams plans versus individuals. Some of those things where I got very excited, like, let's ship this, sharing artifacts.
a week later, we're like, okay, now we can ship it. We've got to actually sort these things out. So that's on the content moderation side, I would say. And then on the response side as well, we also have additional pieces that sit there that are either around preventing the model from reproducing copyrighted content. That's something that we want to prevent as well from the completions.
And then other harms that are against the way we think the model should behave and ideally have been caught earlier, but if they aren't, then they can get caught at that last mile. So it's like our head of trust and safety was talking to him last week.
He calls it the Swiss cheese method, which is like no one layer will catch everything, but ideally enough layer stack will catch a lot of it before it reaches the end.
I think I'm very worried about AI-generated fakery across the internet. This morning, I was looking at a Denver Post article about a fake news story about a murder that people are calling the Denver Post to find out why they hadn't reported on it, which is in its own way the correct outcome, right? They heard a fake story, they called a trusted source.
At the same time, the Denver Post had to go run down this fake murder, true crime story Because an AI had just generated it and put it on YouTube. That all seems very dangerous to me. There's the death of the photograph. We talk about it all the time. Are we going to believe what we see anymore? Where do you sit on that?
Anthropic is obviously very safety-minded, but we are still generating content that can go haywire in all kinds of ways.
Yeah, and I would maybe split internal to Anthropic and just what I've just seen out in the world. The Grok image generation stuff that came out like two weeks ago was fascinating. Cause it was almost like, you know, cause I think there was a, maybe they've introduced some, but at launch it felt like there was almost, it was a total free for all.
It's like, do you want to see Kamala with a machine gun? It was like, it was, you know, crazy stuff. I go between believing that like actually having examples like that in the world are actually helpful and almost like inoculating, like what you take for granted as a photograph or not, you know, or even a video or not. I don't think we're far from that as well. And like getting, you know,
Maybe it's calling the Denver post or like a trusted source, or maybe it's like creating some hierarchy of trust that we can go after. You know, I don't, there's no easy answers there as well, but like, that's, I would say like. A industry, almost like a society-wide thing that we're going to reckon with as well, like the image and video pieces.
And then on text, I think what changes with AI is the mass production. So one thing that we look at is any type of coordinated effort. We looked at this as well at Instagram. At individual levels, it might be hard to catch the one person that's like...
commenting on a, you know, Facebook group, trying to start some stuff, you know, cause that's probably indistinguishable from a human, but we'll really look for like networks of coordinated activity.
And we've started, not started, we've been doing the same as well on the Anthropic side, which is looking at, this is going to happen more on thing on the API side, if it happens rather than on cloud.ai, I think there's just more effective, efficient ways of doing things scaled. But when we see spikes in activity, that's when we can go in and say, all right, like, what does this end up looking?
Let's go learn more about this particular API customer, you know, Do we need to have a conversation with them? What are they actually doing? What is the use case?
I think it's important to like be clear as a company, like what you consider bugs versus features, you know, and like, it would be an awful outcome if anthropic models were being used for any kind of like coordination of fake news and, you know, election interference type things. And so we've got the TNS teams actively working on that. And to the extent that like,
we find anything like that'll be a combo, additional model parameters, plus trust and safety to shut it down.
We need to take another quick break. We'll be right back.
Support for this show comes from The Refinery at Domino. Look, location and atmosphere are key when deciding on a home for your business, and The Refinery can be that home. If you're a business leader, specifically one in New York, The Refinery at Domino is an opportunity to claim a defining part of the New York City skyline.
The Refinery at Domino is located in Williamsburg, Brooklyn, and it offers all the perks and amenities of a brand new building while being a landmark address that dates back to the mid-19th century. It's 15 floors of Class A modern office environment housed within the original urban artifact, making it a unique experience for inhabitants as well as the wider community.
The building is outfitted with immersive interior gardens, a glass-domed penthouse lounge, and a world-class event space. The building is also home to a state-of-the-art Equinox with a pool and spa, world-renowned restaurants, and exceptional retail. As New Yorkers return to the office, the refinery at Domino can be more than a place to work.
It can be a magnetic hub fit to inspire your team's best ideas. Visit therefinery.nyc for a tour.
This episode is brought to you by Shopify. Forget the frustration of picking commerce platforms when you switch your business to Shopify, the global commerce platform that supercharges your selling wherever you sell. With Shopify, you'll harness the same intuitive features, trusted apps, and powerful analytics used by the world's leading brands.
Sign up today for your $1 per month trial period at Shopify.com slash tech, all lowercase. That's Shopify.com slash tech.
So if you're a team of developers, Jira better connects you with teams like marketing and design so you have all the information you need in one place. Plus, their AI helps you knock out the small stuff so you can focus on delivering your best work. Get started on your next big idea today in Jira.
We're back with Anthropic Chief Product Officer Mike Krieger to discuss where he thinks generative AI is going next and whether it's somewhat dangerous. With apologies to my friends at Hard Fork, Casey and Kevin, they ask everybody what their P-Doom is. So I'm going to ask you that. But that question is rooted in AGI. What is the chances we think that they'll become self-aware and kill us all?
Let me ask you a variation of that first, which is what if all of this just hastens our own information apocalypse and we end up just taking ourselves out? Do we need the AGI to kill us all, or are we headed towards information apocalypse first?
Yeah, I think the information piece, like living in a society with this amount of internal, without AI already, I think there's already just take textual, primarily textual social media. I think some of that happens on Instagram as well, but it's easier to disseminate when it's just you know, a piece of text that you can rub has already been like a journey, I would say, in the last 10 years.
But I think it comes and goes. I think we go through waves of like, oh, man, this is like, how are we ever going to get to truth? And then good truth tellers emerge. And I think people flock to them. And I think some of them are traditional, like sources of authority. And some of it are just people that have become trusted. And then you
We can get into separate conversation on verification and validation of identity, but I think that's an interesting one as well. But I think I'm an optimistic person at heart if you can't tell. And I think that part of it is my belief from an information sort of, you know, chaos or proliferation piece of our abilities to both learn, adapt, and then grow the right mechanisms in place.
So I remain optimistic that we'll continue to figure it out on that front. The AI component, I think, increases the volume. And the thing you would have to believe is that it could also increase some of the parsing. I'm going to say it was a Neil Stevenson novel that came out a few years ago. It was a William Gibson one. It was one of the two. Two of them had a
The concept of in the future, perhaps you'll have a social media editor of your own and that gets deployed as a sort of gating function between all the stuff that's out there and what you end up consuming. There's some appeal to that to me, which is if there's a massive amount of data to consume, probably not most of it is going to be useful to you.
And I've even been trying to scale back my own information diet. And to the extent that there are things that are interesting, you know, I'd love the idea of, like, go read this thing in depth. Like, this is worthwhile for you.
But let me bring this a little background. We started talking about recommendation algorithms, and now we're talking about classifiers and having filters on social media to help you see stuff. You're on one side of it now, right? Claude just makes the things, and you try not to make bad things. The other companies, Google and Meta, are on both sides of the equation, right?
We're racing forward with Gemini. We're racing forward with Lama. And then we have to make the filtering systems on the other side to keep the bad stuff out. And it feels like those companies are at decided cross-purposes with themselves.
I think an interesting question is – and I don't know what the current – to ask Adam Massuri what he would say, like what percentage of Instagram content could, would, and should be, you know, AI generator, at least AI.
Now from your seat at Anthropic, knowing how the other side works, is there anything you're doing to make the filtering easier? Is there anything you're doing to make it more semantic, make it more understandable what you're looking at to make it so that the systems that sort the content have an easier job of understanding what's real and what's fake?
Yeah, there's there's on the research side and now outside my expertise, like active work on like, what are the techniques that could make it more detectable? Is it watermarking? Is it probability, et cetera? And I think but open question, but also open are like very active area of research as well. I think the other piece is well, actually, I would break down a three.
There's like what we can do from like like detection and watermarking, etc. side, on the model piece, also have it be able to express some uncertainty a little bit better. You know, like, I actually don't know about this. I'm not willing to speculate, or I'm not actually willing to help you filter these things out, because I'm not sure. I can't tell which of these things are true. And
also open area of research and a very interesting one as well. And then the last one is like, if you're Meta, if you're Google, maybe the bull case is that if primarily you're surfacing content that is generated by models that you yourself are building, there is probably a better closed loop than you can have there.
I don't know if that's going to play out or whether people will always just flock to whatever the most interesting image generation model and create it and go publish it and blow that up. I'm not sure. Well, I think that jury's still out on that one.
But I would believe that the built-in tools, like Instagram, 90 plus percent of photos that were filtered were filtered inside the app because it's just the most convenient thing. And in that way, a closed ecosystem could be one route to at least having some verifiability of generated content.
Instagram filters are kind of an interesting comparison here. Instagram started as photo sharing, Silicon Valley nerds, and it became Instagram. It is a dominant part of our culture, and the filters had real effects on people's self-image, had real negative effects, particularly on teenage girls and how they felt about themselves.
There are some studies that say teenage boys are starting to have self-image issues and body image issues at higher and higher rates because of what they perceive on Instagram. That's bad, right? And it's bad... weighed against the general good of Instagram, which many more people get to express themselves. We build different kinds of communities.
How are you thinking about those risks with Anthropix products?
a coach I was working with, I would always push him like, well, like I want to have, I want to start another company that has as much impact as Instagram. And he's like, well, like there's no cosmic ledger where you'll know exactly what impact you have, first of all. And second of all, like what's the equation by like positive or negative.
And I think like the right way to approach these questions is with humility and then an understanding as things, as things develop. But I, you know, to me it was, I am excited and overall very optimistic about AI and the potential for AI.
If I'm going to be actively working on it, I want it to be somewhere where the risks and the mitigations were as important and as foundational to the founding story, maybe to bring it back to why I joined. That's how I balanced it for myself, which is you need to have that internal run loop of, great, is this the right thing to launch? Should we launch this? Should we change it in some ways?
Should we add some constraints? Should we explain its limitations in some ways? I think it's essential that we grapple with those questions, or else I think you'll end up in the, well, this is clearly just a force for good. Let's blow it up and go all the way out.
I feel like that misses, having seen it at Instagram, you can build a commenting system, but you also need to build the bullying filter that we built.
This is the second decoder question. How do you make decisions? What's the framework?
Actually, maybe I'll go meta for a quick second, which is the culture here at Anthropic is extremely thoughtful and very document writing oriented. So if a decision needs to be made, there's usually a document behind it. There's pros and cons to that. It means that as I joined and I was wondering, like, why did we choose to do this?
People would be like, oh, yeah, there's a doc for that, you know, and there's literally a doc for everything. And then, which helped my ramp up. But sometimes I'd be like, why have we still not built this? They're like, oh, yeah. Somebody wrote a doc about that two months ago. I'm like, well, did we do anything about it? My whole decision-making piece is I want us to get to truth faster.
None of us individually know what's right. Getting to truth could be, let's de-risk the technical side by building a technical prototype. If it's on the product side, let's get it into somebody's hands. Figma mockups are great, but how is it going to move on the screen? And so minimizing time to iteration and time to hypothesis testing is my fundamental decision-making philosophy.
I've tried to install more of that here on the product side. Again, it's a thoughtful, very deliberate culture. I don't want to lose...
most of that but i do want there to be sort of more of this uh hypothesis testing and validation components and i think people feel it when they're like oh yeah we had been debating this for a while but like we actually built it and it turns out neither of us were right and actually there's a third direction that's that's more correct at instagram we went through sort of we ran the gamut of uh strategy frameworks the one that's resonated the most with me consistently is playing to win um i go back to that often and i've instilled some of that here as well as we start thinking about you know like
What's the winning aspiration? Where are we going after? And then more specifically, and we touched upon this in even our conversation today, where will we play? Because we're not the biggest team by size. We're not the biggest chat UI by usage. We're not the biggest AI model by usage either. We've got a lot of interesting players in the space.
We have to be thoughtful about where we play and where we invest. So, yeah. And then... This morning I had a meeting where like the first 30 minutes were people being in pain due to a strategy. And the cliche is like strategy should be painful. And people forget the second part of that is that then you will feel pain when the strategy like creates some tradeoffs.
But also just recognizing that like, you know, on Instagram we always talked about doing fewer things better. That was like a foundational company value.
Wait, what was the tradeoff and what was the pain?
The trade-off was, not getting too much into the technical details, is basically, like, of the next generation of models, like, what particular optimizations we're making. And, you know, can't share exactly what, but, like, it will make one thing really good and another thing just, like, okay or pretty good.
And, like, the thing that's really good, I think, is a big bet and it's going to be really exciting. And everybody's like, yeah. And they're like, but... Yeah, but so I'm actually having us write a little mini document that we can all sign. I know this sounds kind of cheesy, where it's like, we are making this trade-off. This is the implication.
This is how we'll know we're right or wrong, and here's how we're going to revisit this decision. And I want us all to at least cite it in Google Docs and be like, this is our joint commitment to this, or else you end up with the next week of like... But, you know, it's that revisit. So it's like, it's not even disagree and commit. It's like, feel the pain, understand it.
Don't go blindly into it forever. Like I'm a big believer when it comes to like hard decisions, even decisions that can feel like two-way doors. The problem with two-way doors is it's tempting to keep walking back and forth between them.
So you have to kind of like walk through the door and say, the earliest I'd be willing to go back the other way is, you know, two months from now or with this particular piece of information. And hopefully that kind of quiets the like, even internal critic of like, It's a two-way door. I'm always going to want to go back there.
I think this brings me to a question that I've sort of been dying to ask the whole time. You're talking about next generation models. You're new to Anthropic. You're building products on top of these models. I am not convinced that LLMs as a technology can do all the things people are saying they will do. Like my personal PDoom is like, I don't know how you get from here to there.
I don't know how you get from LLM to AGI. I see it being good at language. I don't see it being good at thinking. Do you think LLMs can do all the things people want them to do?
I think current generation, yes, in some areas, no, in others. I think maybe what makes me an interesting product person here is that I really believe in our researchers, but default belief is everything takes longer in life and in general and in research and in engineering than we think it does. I do this mental exercise with the team, which is,
If our research team like got Rip Van Winkle all fell asleep for like five years, I still think we'd have five years of product roadmap. And we'd be like, we are bad at our jobs. We're terrible at our jobs.
We can't think of all the things that even in our current models could do in terms of improving work, accelerating coding, making things easier, coordinating work, even intermediating disputes between people, which I think is a funny LLM use case that like we've even seen play out internally around like.
These two people have this belief, like help us even ask each other the right questions to get us to that place. So it's just a good sounding board as well. Like there's a lot in there that is embedded in the current models.
I would agree with you that like the big open questions to me, I think it's basically like for longer horizon tasks, what is the sort of horizon of independence that you can and are willing to give the model? Like the metaphor I've been using is right now, LLM chat is very much, you've got to do the back and forth because you have to correct, you know, you've got to iterate.
No, that's not quite what I meant. I meant this. A good litmus test for me is like, when can I email Claude and generally expect that an hour later, it's not going to give me the answer it would have given me in the chat, which would have been a failure, but like it would have done more interesting things and gone find out things and iterate on them and even like self-critiqued and then responded.
And like that, I don't think we're that far for some domains. I think we're far from some other ones, especially ones that involve sort of like either people longer range planning or thinking or research. But I use that as sort of my capabilities piece. It's like less like, you know, parameter size or like a particular eval. And to me, it's like, again, what problem are you solving?
And right now it's like, I joke with our team. It's like right now talking to Claude is like, a very intelligent amnesiac. It's like every time you start a new conversation, it's like, wait, who are you again? Like, what am I here for? Like, what did we work on before? And it's like, instead, it's like, all right, like, can we carry continuity?
Can we like have it be able to plan and execute on a longer horizon? And can you start trusting it to get some more things in? Because there's things I do every day that I'm like, I spent an hour on, you know, some stuff that I'd really wish I didn't have to do.
And it's not like particularly leveraged use of my time, but I don't think Claude could quite do it right now without like a lot of scaffolding. And right now, here's maybe like a more succinct way to put a bow on it. Like right now, the scaffolding needed to get it to execute more complex tasks doesn't always feel worth the trade-offs because you probably could have done it yourself.
I think there's an XCD comic on like time spent automating something versus time that you actually get to save doing it. Like that trade-off is at different points on the AI curve. And I think that would be the bet is can we shorten that time to value so that you can trust it to do more of those things.
Like, you know, probably nobody really gets excited to put a, you know, coalesce all the planning documents that my product teams are working on into one document, write the meta narrative and like circulate to these three people. Like, man, I don't want to do that today. I have to do it today, but I don't want to do it today.
Well, let me ask you in a more numeric way. I'm looking at some numbers here. Anthropic, $7 billion worth of funding as of last year. Anthropic has taken more than $7 billion of funding over the last year. You're one of the few people in the world who has ever built a product that has delivered a return on $7 billion worth of funding, right, at scale.
You can probably imagine some products that might return on that investment. Can the LLMs you have today build those products?
I think that's an interesting way of asking. The way I think about it is the LLMs today deliver value, but they also deliver our ability or help our ability to go build a thing that delivers that value.
So what actually, let me ask a threshold question. What are those products that can deliver that much value?
To me, it's like, it's right now, Claude is an assistant and, you know, the helpful kind of sidekick is the word I heard it internally at some point. She's like, at what point is it a coworker? Because like the joint amount of work that can happen in, even in a growing economy, with assistance, I think, is very, very large. So I think a lot about, you know, we have Cloud for Work.
Cloud for Work right now is a sort of almost a tool for thought. You can put in documents, you can sync things and have conversations and people find value. Somebody built a, like, small, like, fission reactor or something. It was on Twitter. I was like, not using Cloud, but Cloud was their, you know, their tool for thought. To the point where, like,
it is now an entity that you actually trust to execute autonomous work within the company. That delivered product, it sounds like a fanciful idea. I actually think the delivery of that product is way less sexy than people think. It's about permission management. It's about identity. It's about coordination. It's about remediation of issues.
It's all the stuff that you actually do in training a good person to be good at their jobs. That, to me, even within a particular discipline...
some coding tasks, some particular tasks that involve coalescing of information or researching, each of those, getting to have the incremental person on your team, even if they're not, in this case, I'm okay with not net plus one productive, but net 0.25, but maybe there's a few of them, and coordinated. I get very excited about the economic potential for that.
That's all at what, 20 bucks a month, enterprise subscription product?
I had this debate with somebody around, I think the price point for that is much higher if you're delivering that kind of value. But I was debating with somebody around, you know, what Snowflake and Databricks and those have shown, like, Datadog is another one. Like, usage-based billing is, like, you know, the new hotness.
If we had, like, subscription billing, now we have, like, usage-based billing. And, like, the thing I would like to get us to, it's hard to quantify today, although maybe we'll get there, is, like, a real value-based billing. Like, what did you actually accomplish with this? And... You know, there's people that will ping us because like a common complaint I hear is that people hit our rate limits.
They're like, I want more cloud. I saw somebody who like, well, I have two clouds. I have like two different browser windows. I'm like, God, we got to do a better job here. But the reason they're willing to do that, they write in and they say like, look, I'm like working on a brief for a client. They are paying me X, you know, amount of money.
I would happily pay another $100 to get me to finish the thing so I can deliver it on time and move on to the next one. That, to me, is an early sign of where we fit, where we can provide value that is even beyond a $20 subscription.
But when I think about deployed clods, and this is early product thinking, but it's things I get excited about, being able to think about what value are you delivering and really align over time is the way where I think it just creates a very... full alignment of incentives there in terms of delivering that product. So that's, I think that's an area we can get to over time.
So I'm going to bring this all the way back around. We started by talking on distribution and whether things can get so tailored for their distribution that they don't work in other contexts. I look around and I see Google distributing Gemini on its phones. I look at Apple distributing Apple intelligence on its phones.
They've talked about maybe having some model interchangeability in there between right now it's open AI, but Maybe Gemini will be there. Maybe Cloud will be there. That feels like the big distribution. They're just going to take it and these are the experiences people will have unless they pay some other money to someone else.
In the history of computing, the free thing that comes with your operating system tends to be very successful. How are you thinking about that problem? Because if you're just like, I don't think OpenAI is getting any money to be an Apple intelligence. I think Apple just thinks some people will convert for 20 bucks and they're Apple and that's going to be as good as it gets.
How are you thinking about this problem? How are you thinking about widening that distribution, not optimizing for other people's ideas?
Yeah, I love the question. I get asked this all the time, even internally, like, what should we be pushing harder into like an on device experience? And I agree, it's gonna be hard to supersede the built in model provider there, you know, even if our model might be better particular use case, there's like a utility thing I get more excited about.
Can we be better at being close to your work and like work products have a much better history than the built in sort of thing like pages comes with and plenty of people do their work on pages. I hear I don't know, but like, you know, there's still a real value for a Google Docs or even a notion and other people that can go deep on a particular like.
sort of take on that sort of productivity piece. So I think it's why I lean us heavier, more into help people get things done. And some of that will be mobile, but almost maybe as a companion and, uh, provide and deliver value that is almost like independent of needing to be exactly integrated into the desktop. I think as an independent company, trying to be that, like that first call that Siri.
I've heard the pitch from startups even before I joined here, like, we're going to do that. We're going to be so much better. And the new action button means that you can bring it up and then press up. I'm like, no, the default really, really matters there. Instagram never tried to replace the camera.
We just tried to make a really good thing about what you could do once you decided that you wanted to do something novel with that photo. And then sure, people took photos in there. But by the end, when we left, it was like, 85% library, 15% camera, right? Like there's a real value to like the thing that just requires the one click.
So it was interesting because, you know, every WWDC that would come around, pre-Instagram, I loved watching those announcements. I was like, well, what are they going to announce? And then like a change is like, oh, what are they going to announce? You get to the point where you realize they're going to be really good at some things. Google's going to be great at some things.
Apple's going to be great at some things. You have to find the places where you can differentiate either in a cross-platform way, either in a depth of experience way, either in a novel take on how work gets done way, or be willing to do the kind of work that some companies are less excited to do because maybe at the beginning they don't seem super scalable.
Are there consumer-scalable $7 billion worth of consumer products that don't rely on being built into your phone?
I mean, I open up the App Store and ChatGPT is regularly second. I don't know what their numbers look like in terms of that business, but I think it's like pretty healthy right now. But long-term, I think it's, yeah. I actually, I optimistically believe yes, because I think they, even on a like,
Let's conflate mobile and consumer for a second, which is not a super fair conflation, but I'm going to go with it for a second, which is so much of our lives still happen there that whether it's within LLM plus recs recommendations or LLM plus shopping or LLMs plus even dating, I have to believe that at least a heavy AI component can be in a $7 billion plus business, but not one where you are trying to effectively be...
like Siri++. I think that's a hard place to be.
Yeah. OpenAI's answer to this appears to be search. I feel like I need to disclose, like every other media company, Vox Media has taken the money. I have nothing to do with this deal. I'm just letting people know that we took the money, too. It feels like their answer is search, right? If you can claw off some percentage of Google, you've got a pretty good business.
That's basically what Satya Nadella told me about Bing when they launched ChatGPT-powered Bing. Like, any half a percent of Google is a huge boost to Bing. Would you build a search product like that? We've talked about recommendations a lot. The line between recommendations and search is like right there.
Yeah. It's not on my mind for any kind of near-term thing. I'm very curious to see. I haven't gotten access to it probably for good reasons, although I know Kevin real pretty well. I should just call him. So I haven't gotten to play with it. But that space of the perplexities, search, chat, GPT search? I forgot how they actually brand it. Search GPT. Search GPT.
Yeah, I mean, it ties back to the very beginning of our conversation, which is search engines in the world of summarization and citations, but... you know probably fewer clicks and where does that end up you know how does that all tie together and connect and and it's less core i would say to what we're trying to do
So it sounds like right now the focus is on work, right? You described a lot of work products that you're thinking about, maybe not so much on consumer. I would say the danger in the enterprise is it's bad if your enterprise software is hallucinating, just broadly. It seems risky.
It seems like those folks might be more inclined to sue you if you send some business haywire because the software is hallucinating. Is this something you can solve? I've had a lot of people tell me that LLMs are always hallucinating and we're just controlling the hallucinations. And I should stop asking people if they can stop hallucinating because the question doesn't make any sense.
Is that how you're thinking about it? Can you control it so that you can build reliable enterprise products?
I think we have a really good shot there. The two places that most recently this came up, one was we are current like LLMs will oftentimes try to do math. Sometimes they actually are, especially given the architecture, impressively good at math, but not always. And especially not when it comes to like higher order things or even things like counting letters and words.
I think you could actually get there. And so one tweak we've made recently is just helping Cloud, at least on Cloud AI, recognize when it is more in that situation and explain its shortcomings. Is it perfect? No, but it significantly improved that particular thing. Because from an enterprise, then this came directly from an enterprise customer that said, hey, I was trying to do some CSV parsing.
I'd rather you give me the Python to go analyze the CSV than try to do it yourself because I don't trust that you're going to do it right yourself. So I think on the data analysis, code interpretation, that front, I think it's a combination of having the tools available. LLMs are very smart. Sorry, humans. I still use calculators all the time.
In fact, over time, I feel like I get worse at mental math and like rely on those even more. So I think there's a lot of value to, hey, give it tools, teach it to use tools, which is a lot of what the research team focuses on. And then really emphasize the time where like, yeah, I know you think you can do this.
The joke I do is, like, the CSV version is, like, yeah, I can eyeball a column of numbers and give you, like, my average. It's probably not going to be perfectly right. So I'd rather, like, use the, you know, average function. So that's on the data front.
On the citations front, the app that has done this most well recently, I have no affiliation with this other than, like, we listen to her parenting advice all the time. Which is, like, Dr. Becky, who's, like, a parenting guru, has a new app out. And I really like that. playing with chat apps because I really try to push them.
And I push this one so hard around trying to hallucinate or talk about something I wasn't familiar with. And I have to go talk to the maker. They're actually... ping them on Twitter, they do a great job of like, if it's not super confident that that information is in its sort of retrieval window, it will just refuse to answer and it won't confabulate it, it won't go there.
And I think that that is an answer as well, which is like the combination of model intelligence plus data, plus the right like prompting and retrieval so that like you don't want it to answer unless there actually is something grounded in the context window helps tremendously on that hallucination front. Does it cure it? Probably not. But I would say that all of us make mistakes.
Hopefully, they're predictably shaped mistakes so you can be in like, oh, danger zone, talking outside of our piece there. I even like the idea of even having some almost syntax highlighting for like, this is rounded from my context. This is from my model knowledge. This is out of distribution, like Danger Will Robinson.
I'm not sure if this is exactly where I'm like, I'm not exactly sure what I'm talking about. Maybe there's something there.
This all just adds up to my feeling that prompt engineering and then teaching a model to behave itself feels non-deterministic in a way. The future of computing is this misbehaving toddler, and we just have to contain it, and then we'll be able to talk to computers like real people, and they'll be able to talk to us like real people.
That just seems wild to me, that even if you're going to release the system prompts, I read the system prompts, and I'm like, this is how we're going to do it? Apple system prompt is, do not hallucinate. This is how we're doing it. Does that feel right to you? Does that feel like a stable foundation for the future of computing?
It's a huge adjustment. I'm an engineer at heart. I like determinism in general. We had an insane issue at Instagram that we eventually tracked down to using non-EZC RAM and literal cosmic rays were flipping RAM. When you get to that, you're like, I want to rely on my hardware. It was actually a moment maybe four weeks into this role where I was like, okay, I can see the perils and potentials.
We were building a system in collaboration with a customer. And we talk about tool use, right? Like what the model has access to. And we had made two tools available to the model in this case. And one was a to-do list app that it could write to. And one was like a just like reminder, sort of like short-term or like timery type thing. And the to-do list system was down.
And it's like, oh man, I tried to use the to-do, I couldn't do it. You know what I'm going to do? I'm going to set a timer for when you meant to be reminded about this task. And so it set an absurd timer. It was like a 48-hour timer. You would never do that on your phone. It would be ridiculous. But it, to me, showed that non-determinism also leads to creativity.
And that creativity in the face of uncertainty is ultimately how I think we are going to be able to solve these higher-order, more interesting problems. And that was the moment I was like... It's non-deterministic, but I love it. You know, it's like non-deterministic, but I can put it in these odd situations and it will do its best to recover or like act in the face of uncertainty.
Whereas any other sort of like heuristic basis, if I had written that, I probably would never have thought of that particular workaround. But it did. And it did it in a, I think, pretty creative way. So I... I can't say it sits totally easily with me because I still like determinism and I like predictability and systems and we seek predictability where we can find it.
But I think I've also seen the value of like within that constraint with the right tools and the right sort of infrastructure around it, how it could be more robust to like the like needed messiness of the real world.
You're building out the product infrastructure. You're obviously thinking a lot about the big products and how you might build them. What should people be looking for from Anthropic? What's the major point of product emphasis we should be looking for?
Yeah, so on the cloud side, I think the time we talk and it airs, we're launching Cloud for Enterprise. So this is our push into really going deeper. It's a bunch of on-the-surface, unexciting acronyms like SSO and SCIM and data management and audit logs. But the importance of that is that you start getting to push into really deep use cases.
And we're building data integrations that make that useful as well. So there's that whole component. And then on the API side, we didn't talk as much about the API side, although I think of that as much as an important product as anything else that we're working on. The big push is, how do we get lots of data into the models?
The models are ultimately, they're smart, but I think they're not that useful without good data in there. It's tied to the use case. How do we get a lot of data in there and make that really quick? So we launched explicit prompt caching last week, which basically lets you take a very large data store, put it in the context window, and retrieve it 10 times faster than before.
Look for those kinds of ways in which the models can be brought closer to people's actual interesting data. Again, this always ties back to Artifact and get you personalized useful answers in the moment at speed and at low cost. That whole push, I think a lot about good product design pushes extremes in some direction.
This is the lots of data, but also push the latency extreme and see what happens when you combine those two axes. That's the thing that we'll continue pushing for the rest of the year. Yeah.
Well, Mike, this has been great. I could talk to you forever and ever about this stuff. Thank you so much for joining Decoder.
It's great to be here.
I'd like to thank Mike Krieger for taking the time to join Decoder. And thank you for listening. I hope you enjoyed it. If you'd like to let us know what you thought about the show or anything else you'd like us to cover, please drop us a line. You can email us at decoderattheverge.com. We really do read all the emails. Or you can hit me up on threads on that reckless 1280. We also have a TikTok.
It's at decoderpod. It's a lot of fun. If you like Decoder, please share it with your friends and subscribe wherever you get your podcasts. Decoder is a production of The Verge and part of the Vox Media Podcast Network. Our producers are Kate Cox and Nick Statt. Our editor is Callie Wright. Our supervising producer is Liam James. The Decoder music is by Breakmaster Cylinder.
We'll see you next time.