Nathan Lambert
Appearances
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then maybe OpenAI does less and Anthropic does less. And then on the other end of the spectrum is XAI. But they all have different forms of RLHF trying to make them a certain way.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think it's actually probably simpler than that. It's probably something related to computer user robotics rather than science discovery. Because the important aspect here is models take so much data to learn, they're not sample efficient, right? Trillions, they take the entire web, right? Over 10 trillion tokens to train on, right? This would take a human... thousands of years to read, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And humans know most of the stuff, a lot of the stuff models know better than it, right? Humans are way, way, way more sample efficient. That is because of the self-play, right? How does a baby learn what its body is? As it sticks its foot in its mouth and it says, oh, this is my body.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It sticks its hand in its mouth and it calibrates its touch on its fingers with the most sensitive touch thing on its tongue. This is how babies learn. And it's just self-play over and over and over and over again. And now we have something that is similar to that with these verifiable proofs, whether it's a unit test in code or...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
mathematical verifiable task, generate many traces of reasoning, right? And keep branching them out, keep branching them out. And then check at the end, hey, which one actually has the right answer? Most of them are wrong. Great. These are the few that are right. Maybe we use some sort of reward model outside of this to select even the best one to preference as well.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But now you've started to get better and better at these benchmarks. And so you've seen over the last six months, a skyrocketing in a lot of different benchmarks, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So the thing here is that... These are only with verifiable tasks. We earlier showed an example of the, you know, the really interesting, like what happens when chain of thought is to a non-verifiable thing. It's just like a human, you know, chatting, right? With the, you know, thinking about what's novel for humans, right? A unique thought.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But this task and form of training only works when it's verifiable. And from here, the thought is, okay, we can continue to scale this current training method by increasing the number of verifiable tasks. In math and coding, coding probably has a lot more to go. Math has a lot less to go in terms of what are verifiable things.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Can I create a solver that then I generate trajectories toward or reasoning traces towards and then prune the ones that don't work and keep the ones that do work? Well, those are going to be solved pretty quickly, but even if you've solved math, you have not... actually created intelligence, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so this is where I think the like, aha moment of computer user robotics will come in because now you have a sandbox or a playground that is infinitely verifiable, right? Did you, you know, messing around on the internet, there are so many actions that you can do that are verifiable. It'll start off with like, log into a website, create an account, click a button here, blah, blah, blah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But it'll then get to the point where it's, hey, go do a task on Tasker or whatever these other, all these various task websites. hey, go get hundreds of likes, right? And it's going to fail. It's going to spawn hundreds of accounts. It's going to fail on most of them. But this one got to a thousand. Great. Now you've reached the verifiable thing.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And you just keep iterating this loop over and over. And that's when... And same with robotics, right? That's where, you know, where you have an infinite playground of tasks like, hey, did I put the ball in the bucket? All the way to like, oh, did I like build a car, right? Like, you know, there's a whole...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
trajectory to speed run or you know what models can do but at some point i truly think that like you know we'll spawn models and initially all the training will be in sandboxes but then at some point you know the language model pre-training is going to be dwarfed by what is this reinforcement learning you know you'll pre-train a multimodal model that can see that can read that can write you know blah blah blah whatever vision audio etc but then you'll have it play in a sandbox and
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
infinitely and figure out figure out math figure out code figure out navigating the web figure out operating a robot arm right and then it'll learn so much and the aha moment i think will be when this is available to then create something that's not good right like oh cool part of it was like figuring out how to use the web now all of a sudden it's figured out really well how to just get hundreds of thousands of followers that are real and real engagement on twitter because all of a sudden this is one of the things that are verifiable
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And it's verifiable, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think one of the things that people are ignoring is Google's Gemini flash thinking is both cheaper than R1 and better. And they released it in the beginning of December. And nobody's talking about it. No one cares.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I get the same question from earlier. Uh, the one about the, the human nature.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Oh, and it latched onto human, and then it went into organisms, and oh, wow. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think when you, you know, to Nathan's point, when you look at like the reasoning models, to me, even when I used R1 versus O1, there was like that sort of rough edges around the corner feeling, right? And flash thinking, you know, earlier, I didn't use this version, but the one from December, and it definitely had that rough edges around the corner feeling, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Where it's just not fleshed out in as many ways, right? Sure, they added math and coding capabilities via these verifiers in RL, but it feels like they lost something in certain areas. And O1 is worse performing than chat in many areas as well, to be clear. Not by a lot. Not by a lot though, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And it's like R1 definitely felt to me like it was worse than V3 in certain areas, like doing this RL expressed and learned a lot, but then it weakened in other areas. And so I think that's one of the big differences between these models is
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
and the and and and what oh one offers and then open ai has oh one pro and what they did with oh three which is like also very unique is that they stacked search on top of chain of thought right um and so chain of thought is one thing where it's able it's one chain it backtracks goes back and forth but how they served solved the arc agi challenge was not just the chain of thought
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It was also sampling many times, i.e. running them in parallel and then selecting.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then what?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Another form of search is just asking five different people and then taking the majority answers. Yes.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
right there's a variety of like you know it could be complicated it could be simple we don't know what it is just that they are they are not just issuing one chain of thought in sequence they're launching many in parallel and in the arc hgi they launched a thousand in parallel for their uh the one that like really shocked everyone that beat the benchmark was they they would launch a thousand in parallel and then they would get the right answer like 80 of the time or 70 of the time 90 maybe even
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Whereas if they just launched one, it was like 30%.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So the fantastic thing is, and it's in the thing that I pulled up earlier, but the cost for GPT-3 has plummeted, if you scroll up just a few images, I think.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
important thing about like hey is cost a limiting factor here right like my my view is that like we'll have like really awesome intelligence before we have like agi before we have it permeate throughout the economy um and this is sort of why that reason is right gpt3 was trained in what 2020 2021 um and the cost for running inference on it was 60 70 per million tokens right um which is the cost per intelligence was ridiculous
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now, as we scaled forward two years, we've had a 1200x reduction in cost to achieve the same level of intelligence as GPT-3.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's like 5 cents or something like that now, right? Which is versus $60, 1200x. That's not the exact numbers, but it's 1200x. I remember that number. Is... is the humongous cost per intelligence, right? Now, the freak out over DeepSeq is, oh my God, they made it so cheap.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's like, actually, if you look at this trend line, they're not below the trend line, first of all, and at least for GPT-3, right? They are the first to hit it, right? Which is a big deal. But they're not below the trend line as far as GPT-3. Now we have GPT-4. What's going to happen with these reasoning capabilities, right? It's a mix of architectural innovations.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's a mix of better data, and it's going to be better training techniques, and all of these better inference systems uh, better hardware, right. Uh, going from, you know, each generation of GPU to new generations or a six, everything is going to take this cost curve down and down and down and down.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then can I go in, can I just spawn a thousand different LLMs to create a task and then pick from one of them or, you know, whatever search search technique I want a tree Monte Carlo tree search. Maybe it gets that complicated. Um, maybe it doesn't cause it's too complicated to actually scale. Like who knows a better lesson, right? Uh, the, the question is, is,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think when, not if, because the rate of progress is so fast, right? Nine months ago, Dario was saying, or Dario said nine months ago, the cost to train and inference was this, right? And now we're much better than this, right? And DeepSeek is much better than this.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that cost curve for GPT-4, which was also roughly $60 per million tokens when it launched, has already fallen to $2 or so, right? And we're going to get it down to cents, Probably. For GPT-4 quality, and then that's the base for the reasoning models like O1 that we have today, and O1 Pro is spawning multiple, right? And O3 and so on and so forth.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
These search techniques, too expensive today, but they will get cheaper. And that's what's going to unlock the intelligence, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think I think and like there are a lot of false narratives, which is like, hey, these guys are spending billions on models. Right. And they're not spending billions on models. No one spent more than a billion dollars on a model that's released publicly. Right. GPT-4 was a couple hundred million. And then, you know, they've reduced the cost with 4.0, 4 turbo 4.0. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But billion dollar model runs are coming. right? And this concludes pre-training and post-training, right? And then the other number is like, hey, DeepSeq didn't include everything, right? They didn't include... A lot of the cost goes to research and all this sort of stuff. A lot of the cost goes to inference. A lot of the cost goes to post-training. None of these things were factored.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's research salaries, right? All these things are counted in the billions of dollars that OpenAI is spending, but they weren't counted in the, hey, $6 million, $5 million that DeepSeq spent, right? So there's a bit of misunderstanding of what these numbers are. And then there's also an element of NVIDIA has just been a straight line up, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And there's been so many different narratives that have been trying to push down NVIDIA. I don't say push down NVIDIA stock. Everyone is looking for a reason to sell or to be worried, right? You know, it was Blackwell delays, right? Their GPU, you know, there's a lot of report. Every two weeks, there's a new report about their GPUs being delayed. There's...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's the whole thing about scaling laws ending, right? It's so ironic, right? It lasted a month. It was just like literally just, hey, models aren't getting better, right? They're just not getting better. There's no reason to spend more. Pre-training scaling is dead. And then it's like, oh, one, oh, three, right? R1. R1, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And now it's like, wait, models are getting too, they're progressing too fast. Slow down the progress. Stop spending on GPUs, right? But, you know, the funniest thing I think that like comes out of this is Javon's paradox is true, right? AWS pricing for H100s has gone up over the last couple of weeks, right? Since a little bit after Christmas, since V3 was launched, AWS H100 pricing has gone up.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
H200s are like almost out of stock everywhere because H200 has more memory and therefore R1 wants that chip over H100, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Right. And semiconductors is, you know, we're at 50 years of Moore's law. Every two years, half the cost, double the transistors, just like clockwork. And it's slowed down, obviously. But like the semiconductor industry has gone up the whole time. Right. It's been wavy. Right. There's obviously cycles and stuff. And I don't expect AI to be any different. Right. There's going to be ebbs and flows.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But this is an AI. It's just playing out at an insane timescale. Right. It was 2x every two years. This is 1200x in like three years. So it's like the scale of improvement that is hard to wrap your head around.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And has press releases about them cheering about being China's biggest NVIDIA customer, right? Like, Obviously, they've quieted down, but I think that's another element of it, is that they don't want to say how many GPUs they have. Because, hey, yes, they have H800s. Yes, they have H20s. They also have some H100s, which were smuggled in.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think there's a few angles of smuggling here, right? One is ByteDance arguably is the largest smuggler of GPUs for China, right? China's not supposed to have GPUs. ByteDance has like over 500,000 GPUs. Why? Because they're all rented from companies around the world. They rent from Oracle. They rent from Google. They rent from all these mass and a bunch of smaller cloud companies too, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
All the Neo clouds, right? Of the world. They rent so, so many GPUs. They also buy a bunch, right? And they do this for mostly like what Meta does, right? Serving TikTok, right? Next best discussion. And Trump admin looks like they're going to keep them, which limits like allies, even like Singapore, which Singapore is like 20% of NVIDIA's 20, 30% of NVIDIA's revenue.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But Singapore had a mematorium on not building data centers for like 15 years because they don't have enough power. So where are they going? I mean, I'm not claiming they're all going to China, right? But a portion are, you know, many are going to Malaysia, including Microsoft and Oracle have big data centers in Malaysia.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like, you know, they're going all over Southeast Asia, probably India as well, right? Like there's stuff routing, but like the diffusion rules are very de facto. Like you can only buy this many GPUs from this country. And you can only rent a cluster this large to companies that are Chinese, right? Like they're very explicit on trying to stop smuggling, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And a big chunk of it was, hey, let's, you know, random company by 16 servers, ship them to China, right? There's actually, I saw a photo from someone in the semiconductor industry who leads like a
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
a team for like networking chips uh that competes with nvidia and he sent a photo of a guy checking into a first class united flight from san francisco to shanghai or shenzhen with a super micro box that was this big which can only contain gpus right and he was booking first class because think about it three to five k for your first class ticket server cost you know 240 000 in the us 250 000 you sell it for 300 000 in china wait you just got a free first class ticket and a
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
a lot more money. So it's like, you know, and that's like small scale smuggling. Most of the large scale smuggling is like companies in Singapore and Malaysia, like routing them around or renting GPUs completely legally.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. So a My belief is that last year, roughly, so NVIDIA made a million H20s, which are legally allowed to be shipped to China, which we talked about is better for reasoning, inference at least, maybe not training, but reasoning inference, and inference generally.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Then they also had, you know, a couple hundred thousand, we think like 200 to 300,000 GPUs were routed to China from, you know, Singapore, Malaysia, US, wherever. Companies spawn up by 16 GPUs, 64 GPUs, whatever it is, route it. And Huawei is known for having spent up a massive network of like companies to get the materials they need after they were banned in like 2018.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So it's not like otherworldly. But I agree, right? Nathan's point is like, Hey, you can't smuggle up $10 billion of GPUs. And then the third sort of source, which is just now banned, which wasn't considered smuggling, but is China is renting, I believe from our research, Oracle's biggest GPU customer is ByteDance. And for Google, I think it's their second biggest customer.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And you go down the list of clouds, and especially these smaller cloud companies that aren't like the hyperscalers, right? Think beyond Core, even Lambda, even. There's a whole sea. There's 60 different new cloud companies serving NVIDIA GPUs. I think ByteDance is renting a lot of these, right? All over, right? And so these companies... are renting GPUs to Chinese companies.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that was completely legal up until the diffusion rules, which happened just a few weeks ago. And even now you can rent GPU clusters that are less than 2000 GPUs, or you can buy GPUs and ship them wherever you want if they're less than 1500 GPUs, right? So it's like, there are still like some ways to smuggle, but yeah. It's not, you know, as the numbers grow, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, a hundred something billion dollars of revenue for NVIDIA last year, 200 something billion this year, right? And if next year, you know, it could nearly double again or more than double, right? Based on like what we see with data center footprints, like being built out all across the US and the rest of the world. It's going to be really hard for China to keep up with these rules, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yes, there will always be smuggling and deep-seek level models of GPD-4 level models, O-1 level models capable to train on what China can get, even the next tier above that. But if we speed run a couple more jumps, right, to billion-dollar models, $10 billion models, then it becomes... hey, there is a compute disadvantage for China for training models and serving them.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the serving part is really critical, right? DeepSeek cannot serve their model today, right? It's completely out of inventory. It's already started falling in the app store, actually, downloads, because you download it, you try and sign up, they say, we're not taking registrations because they have no capacity, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You open it up, you get like less than five tokens per second if you even get your request approved, right? Because there's just no capacity because they just don't have enough GPUs to serve the model, even though it's incredibly efficient.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, I mean, that's incredibly easy, right? Like OpenAI publicly stated DeepSeq uses their API. And as they say, they have evidence, right? And this is another element of the training regime is people at OpenAI have claimed that it's a distilled model, i.e. you're taking OpenAI's model, you're generating a lot of output, and then you're training on the output in their model.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And even if that's the case, what they did is still amazing, by the way, what DeepSeq did efficiency-wise.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's also public examples, right? Like Meta explicitly stated, not necessarily distilling, but they used 405B as a reward model for 70B in their LAMA 3.2 and 3.3. This is all the same topic.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then the ethical aspect of it is like, why is it unethical for me to train on your model when you can train on the Internet's text?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is why a lot of models today, even if they train on zero OpenAI data, you ask the model who trained you, it'll say, I am Chad GPT trained by OpenAI. Because there's so much copy paste of like OpenAI outputs from that on the internet that you just weren't able to filter it out. And there was nothing in the URL where they implemented like, hey, like, or post-training or SFT, whatever that says.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
hey, I'm actually a model by Allen Institute instead of OpenAI.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think everyone has benefited regardless because the data's on the internet. And therefore, it's in your portrayal now. There are subreddits where people share the best chat GPT outputs, and those are in your model.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Actually, over the last couple of days, we've seen a lot of people distill DeepSeq's model into Lama models because the DeepSeq models are kind of complicated to run inference on because they're a mixture of experts and they're 600 plus billion parameters and all this. And people distill them into the Lama models because...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because the Lama models are so easy to serve and everyone's built the pipelines and tooling for inference with the Lama models, right? Because it's the open standard. So, you know, we've seen it. We've seen a sort of roundabout, right? Like, is it bad? Is it illegal? Maybe it's illegal, whatever. I don't know about that.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I agree. I have a schizo take on how you can solve this because it already works. I have a reasonable take on it. Japan has a law which you're allowed to train on any training data and copyrights don't apply if you want to train a model. A. B. Japan has 9 gigawatts of curtailed nuclear power. C, Japan is allowed under the AI diffusion rule to import as many GPUs as they'd like.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So all we have to do, we have a market here to make. We build massive data centers, we rent them to the labs, and then we train models in a legally permissible way and there's no if, ands, or buts. And now the models have no potential copyright lawsuit from New York Times or anything like that. No, no, it's just completely legal.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. As far as industrial espionage and things, that has been greatly successful in the past. The Americans did it to the Brits, the Chinese have done it to the Americans, and so on and so forth. It is a fact of life. And so to argue industrial espionage can be stopped is probably unlikely.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You can make it difficult, but even then, there's all these stories about, hey, F-35 and F-22 have already been given to China in terms of design plans and stuff. Code and stuff like between, you know, I say companies, not nation states is probably very difficult. But ideas are discussed a lot, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Whether it be a house party in San Francisco or a company changing employees or, you know, or the, you know, the always the like mythical honeypot that always gets talked about, right? Like someone gets honeypotted, right? Because everyone working on AI is a single dude who's in their 20s and 30s. Not everyone, but, like, an insane amount of... Insane percentages.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, or male, right? You know, it's San Francisco, right? But as a single dude, I will say, in his late 20s, right, is, like, we were very easily corrupted, right? Like, you know, like... Not corrupted myself, but you know, we are, we are, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. So I think the thing that's like really important about these mega cluster build outs is they're completely unprecedented in scale. Right. U.S., you know, sort of like data center power consumption has been slowly on the rise and it's gone up to two, three percent even through the cloud computing revolution. Right. Data center consumption as a percentage of total U.S.,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that's been over decades, right, of data centers, et cetera. It's been climbing, climbing slowly. But now, two to three percent. Now, by the end of this decade, it's, like, even under, like, you know, when I say, like, 10%, a lot of people that are traditionally, by, like, 2028, 2030, people traditionally non-traditional data center people, like, that's nuts.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then, like, people who are in, like, AI who have, like, really looked at this at, like, the anthropics and open AIs are, like, that's not enough. And I'm, like, okay. But, like... you know, this is, this is both through, uh, globally distributed, uh, or distributed throughout the U S as well as like centralized clusters, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The, the distributed throughout the U S is, is exciting and it's the bulk of it, right? Like, Hey, you know, uh, open AI or, uh, you know, say meta is adding a gigawatt, right? Um, but most of it is distributed through the U S for inference and all these other things, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I thought I was about to do the Apple ad, right? What's a computer? So traditionally, data centers and data center tasks have been a distributed systems problem that is capable of being spread very far and widely, right? I.e., I send a request to Google, it gets routed to a data center somewhat close to me, it does whatever search ranking recommendation, sends a result back, right? Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The nature of the task is changing rapidly in that there's two tasks that people are really focused on now, right? It's not database access. It's not serve me the right page, serve me the right ad. It's now... A, inference. And inference is dramatically different from traditional distributed systems, but it looks a lot more similar. And then there's training, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The inference side is still like, hey, I'm going to put, you know, thousands of GPUs and, you know, blocks all around these data centers. I'm going to run models on them. You know, user submits a request, gets kicked off. Or, hey, my service, you know, they submit a request to my service, right? They're on Word and they're like, oh, yeah, help me copilot. And it kicks it off.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I'm on my Windows, copilot, whatever. Apple intelligence, whatever it is, it gets kicked off to a data center. right? And that data center does some work and sends it back. That's inference. That is going to be the bulk of compute. But then, you know, and that's like, you know, there's thousands of data centers that we're tracking with like satellites and like all these other things.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And those are the bulk of what's being built. But the scale of... And so that's like what's really reshaping and that's what's getting millions of GPUs. But the scale of the largest cluster is also really important, right? When we look back at history, right? Like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
you know or through through the age of ai right like it was a really big deal when they did alex net on i think two gpus or four gpus i don't remember it's a really big deal it's a big deal because you use gpus it's a big deal they use gpus um and they use multiple right but then over time its scale has just been compounding right and so when you skip forward to gpt3 then gpt4 gpt4 20 000
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
a 100 gpus unprecedented run right in terms of the size and the cost right a couple hundred million dollars on a yolo right a yolo run for gpd4 and it and it yielded you know this magical improvement that was like perfectly in line with what was experimented and just like a log scale right oh yeah they have that plot from the paper the scaling the technical performance
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The scaling laws were perfect, right? But that's not a crazy number, right? 20,000 A100s, roughly each GPU is consuming 400 watts. And then when you add in the whole server, right, everything, it's like 15 to 20 megawatts of power, right? You know, maybe you could look up what the power of consumption of a human person is because the numbers are going to get silly.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But like 15 to 20 megawatts was standard data center size. It was just unprecedented. That was all GPUs running one task. How many watts was a toaster? A toaster is like a similar power consumption to an A100, right? H100 comes around, they increase the power from like 400 to 700 watts, and that's just per GPU, and then there's all the associated stuff around it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So once you count all that, it's roughly like 1200 to 1400 watts for everything, networking, CPUs, memory, blah, blah, blah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, so I think, yeah, sorry for skipping past that. And then the data center itself is complicated, right? But these are still standardized data centers for GPT-4 scale, right? Now we step forward to sort of what is the scale of clusters that people built last year? And it ranges widely.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It ranges from like, hey, these are standard data centers and we're just using multiple of them and connecting them together really with a ton of fiber between them, a lot of networking, et cetera. That's what OpenAI and Microsoft did in Arizona. And so they have 100,000 GPUs. Meta, similar thing. They took their standard existing data center design.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, and it looks like an H and they connected multiple of them together. Um, and you know, they got to, they first did 16,000 GPUs, uh, 24,000 GPUs total, only 16 of them, thousand of them were running on the training run because GPUs are very unreliable.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So they need to have spares to like swap in and out all the way to like now a hundred thousand GPUs that they're training on Lama for on currently, right? Like 128,000 or so, right? This is, you know, think about a hundred thousand GPUs, um, with roughly 1400 watts a piece, that's 140 megawatts, 150 megawatts, right? For 128, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So you're talking about, you've jumped from 15 to 20 megawatts to 10x, you know, almost 10x that number, 9x that number to 150 megawatts in... In two years, right? From 2022 to 2024, right? And some people like Elon, he admittedly, right? And he says it himself, got into the game a little bit late for pre-training large language models, right? XAI was started later, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then he bent heaven and hell to get his data center up and get the largest cluster in the world, right? Which is 200,000 GPUs. And he did that. He bought a factory in Memphis. He's upgrading the substation, but at the same time, he's got a bunch of mobile power generation, a bunch of single cycle combine.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
He tapped the natural gas line that's right next to the factory, and he's just pulling a ton of gas, burning gas. He's generating all this power. He's in a factory, in an old appliance factory that shut down and moved to China long ago, right? And he's got 200,000 GPUs in it. And now what's the next scale, right? Like all the hyperscalers have done this.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now the next scale is something that's even bigger, right? And so, you know, Elon, just to stick on the topic, he's building his own natural gas plant, like a proper one right next door. He's deploying tons of Tesla Megapack batteries to make the power more smooth and all sorts of other things. He's got like industrial chillers, right? to cool the water down because he's water cooling the chips.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, so all these crazy things to, uh, get the clusters bigger and bigger. Um, but when you look at like, say what opening I did with Stargate, that's that in Arizona and, um, in Abilene, Texas, right. Uh, what they've announced at least, right. It's not built, right. Elon says they don't have the money. You know, there's some debates about this. Um,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But at full scale, at least the first section is like definitely money's accounted for, but there's multiple sections. But full scale, that data center is going to be 2.2 gigawatts, right? 2200 megawatts of power in and roughly like 1.8 gigawatts or 1800 megawatt. Yeah. 1800 megawatts of power delivered to chips, right? Now, this is an absurd scale.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
2.2 gigawatts is like more than most cities, right? You know, to be clear, delivered to a single cluster that's connected to do training, right? To train these models, to do both the pre-training, the post-training, all of this stuff, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
What is a nuclear power plant again? Everyone is doing this, right? Everyone is doing this, right? Meta in Louisiana, right? They're building two natural gas plants, massive ones, and then they're building this massive data center. Amazon has plans for this scale. Google has plans for this scale. XAI has plans for this scale, right? All of these, the guys that are racing...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The companies that are racing are racing hard and they're doing multi gigawatt data centers, right? To build this out because they think that, yeah, if I now have, you know, obviously pre-training scaling is going to continue, but to some extent, but then also all this post-training stuff where you have an RL sandbox for computer use or whatever, right? Like, you know,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is where they're going to and all these fearful about viable domains where they just keep learning and learning and learning self play, whatever, whatever it is, makes the AI so much more capable because the line does go up, right? As you throw more compute, you get more performance. The shirt is about scaling laws. You know, to some extent, it is diminishing returns, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You 10x the compute, you don't get 10x better model, right? You get a diminishing returns, but also you get efficiency improvements. So you bend the curve, right? And these scale of data centers are wreaking a lot of havoc on the network. Nathan was mentioning Amazon has tried to buy this nuclear power plant, Talon. And if you look at Talon's stock, it's just skyrocketing.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they're building a massive multi-gigawatt data center there. And you just go down the list. There's so many ramifications. Interesting thing is certain regions of the US, transmitting power costs more than actually generating it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because the grid is so slow to build and the demand for power and the ability to build power and re-ramping on a natural gas plant or even a coal plant is easy enough to do. But transmitting the power is really hard. So in some parts of the US, like in Virginia, it costs more to transmit power than it costs to generate it. There's all sorts of second order effects that are insane here.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, Trump's executive orders, there was a Biden executive order before the end of the year, but then Trump had some more executive orders, which hopefully reduced the regulations to where, yes, things can be built. But yeah, this is a big, big challenge, right? Is building enough power fast enough?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So the fun thing here is this is too slow. To build the power plant. To build a power plant or to reconfigure an existing power plant is too slow. And so therefore you must use natural, data center power consumption is flat, right? You know, I mean, like it's spiky, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because data center power is like this, right? You're telling me I'm going to buy tens of billions of dollars of GPUs and idle them because the power is not being generated? Power is cheap, right? If you look at the cost of a cluster, less than 20% of it is power, right? Most of it is the capital cost and depreciation of the GPUs, right? And so it's like, well, screw it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I'll just build natural gas plants. This is what Meta is doing in Louisiana. This is what OpenAI is doing in Texas and all these different places. They may not be doing it directly, but they are partnered with someone. And so There is a couple hopes, right? One is, Elon, what he's doing in Memphis is to the extreme. They're not just using dual combine cycle gas, which is super efficient.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
He's also just using single cycle and mobile generators and stuff, which is less efficient. But you know, there's also like the flip side, which is like solar power generation is like this. And wind is another, like, like this different correlate, you know, different.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So if you stack both of those, plus you get a big chunk of batteries, um, plus you have a little bit of gas, it is possible to run it more green. It's just the timescales for that is slow, right? So people are trying, um, But, you know, Meta basically said, whatever, don't care about my sustainability pledge.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Or they'll buy like a power, it's called a PPA, power purchasing agreement, where there'll be a massive wind farm or solar farm, like wherever. And then they'll just pretend like those electrons are being consumed by the data center. But in reality, they're paying for the power here and selling it to the grid and they're buying power here. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then another thing is like Microsoft quit on some of their sustainability pledges, right? Elon, what he did with Memphis is objectively somewhat dirty, but he's also doing it in an area where there's like a bigger natural gas plant right next door and like a sewer next or not a sewer, but like a wastewater treatment and a garbage dump nearby, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And he's obviously made the world a lot more clean than that one data center is going to do, right? So I think like it's fine to some extent and maybe AGI solves that. you know, global warming and stuff, right? Whatever it is. You know, this is sort of the attitude that people at the labs have, right? Which is like, yeah, it's great. We'll just use gas, right? Because the race is that important.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And if we lose, you know, that's way worse, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I don't know how they're doing the networking, but they're using Nvidia spectrum X ethernet. Um, there's actually like, I think, yeah, the unsung heroes are the cooling and electrical systems, which are just like glossed over. Yeah. Um, But I think like one story that maybe is like exemplifies how insane this stuff is, is when you're training, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You're always doing, you're running through the model a bunch, right? In the most simplistic terms, running through the model a bunch, and then you're going to exchange everything and synchronize the weights, right? So you'll do a step. This is like a step in model training, right? And every step your loss goes down, hopefully, and it doesn't always.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But in the simplest terms, you'll be computing a lot and then you'll exchange. right? The interesting thing is GPU power is most of it. Networking power is some, but it's a lot less. So while you're computing, your power for your GPUs is here.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then when you're exchanging weights, if you're not able to overlap communications and compute perfectly, there may be a time period where your GPUs are just idle and you're exchanging weights and you're like, hey, the model's updating. So you're exchanging the gradients, you do the model update, and then you start training again. So the power goes And it's super spiky.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so funnily enough, when you talk about the scale of data center power, you can blow stuff up so easily. And so Meta actually has accidentally upstreamed something to code in PyTorch where they added an operator. And I kid you not, whoever made this, I want to hug the guy because it says PyTorch. It's like PyTorch.powerplantnoblowup. Okay. equals zero or equal one.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And what it does, what it does is amazing, right? Either, you know, when you're exchanging the weights, the GPU will just compute fake numbers so the power doesn't spike too much. And so then the power plants don't blow up because the transient spikes like screw stuff up.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And Elon's solution was like, let me throw a bunch of Tesla mega packs and a few other things, right? Like everyone has different solutions, but like Meta's at least was publicly and openly known, which is just like set this operator. And what this operator does is it just makes the GPUs compute nothing so that the power doesn't spike.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, yeah. So air cooling has been the de facto standard. Throw a bunch of metal, heat pipes, et cetera, and fans, right? And like that's cooled. That's been enough to cool it. People have been dabbling in water cooling. Google's TPUs are water cooled. Right. So they've been doing that for a few years.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But with GPUs, no one's ever done and no one's ever done the scale of water cooling that Elon just did. Right. Now, next generation NVIDIA is for the for the like highest end GPU. It is mandatory water cooling. You have to water cool it. But Elon did it on this current generation NVIDIA. And that required a lot of stuff, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If you look at like some of the satellite photos and stuff of the Memphis facility, there's all these external water chillers that are sitting basically, it looks like a semi-truck pod thing. What's it called? The container. But really those are water chillers. And he has like 90 of those water chillers just sitting outside. 90 different containers, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
With water, you know, that chill the water, bring it back to the data center, and then you distribute it to all the chips, pull all the heat out, and then send it back, right? And this is both a way to cool the chips, but also an efficiency thing. And going back to that three-vector thing, there is memory bandwidth, flops, and interconnect.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The closer the chips are together, the easier it is to do high-speed interconnects. And so this is also a reason why you're going to go water cooling is because you can just put the chips right next to each other and therefore get higher speed connectivity.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's another word there, but I won't say it, you know?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Today, individual largest is Elon, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Elon's cluster in Memphis, 200,000 GPUs, right? Meta has like 128,000. OpenAI has 100,000. Now, to be clear, other companies have more GPUs than Elon. They just don't have them in one place, right? And for training, you want them tightly connected. There's some techniques that people are researching and working on that lets you train across multiple regions.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But for the most part, you want them all in like one area, right? So you can connect them highly with high-speed networking, right? Um, and so, you know, Elon today has 200,000 GP H one hundreds and H a hundred thousand H one hundreds, a hundred thousand H two hundreds, right. Um, meta open AI, uh, you know, and, and, and Amazon all have on the scale of a hundred thousand, a little bit less.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, but next this year, right this year, people are building much more, right. Anthropic and Amazon are building a cluster of 400,000 tranium too, which is Amazon specific chip, uh, trying to get away from Nvidia. Right. Um, you know, uh, yeah. Meta and OpenAI have scales for hundreds of thousands. But by next year, you'll have like 500,000 to 700,000 GPU clusters.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And note those GPUs are much higher power consumption than existing ones, right? Hopper 700 watts, Blackwell goes to 1200 watts, right? So the power per chip is growing and the number of chips is growing, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, I don't doubt Elon, right? The filings that he has for like, you know, the power plant and the Tesla battery packs, it's clear he has some crazy plans for Memphis, like permits and stuff is open record, right? But it's not quite clear that, you know, what and what the timescales are. I just never doubt Elon, right? You know, that's he's gonna surprise us.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So these mega clusters make no sense for inference, right? You could route inference there and just not train. Yeah. But most of the inference capacity is being, you know, hey, I've got a 30 megawatt data center here. I've got 50 megawatts here. I've got 100 here, whatever. I'll just throw inference in all of those because the mega clusters, right? Multi gigawatt data centers.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I want to train there because that's where all of my GPUs are co-located, where I can put them at a super high networking speed connected together, right? Because that's what you need for training. Now with pre-training, this is the old scale, right? You could, you would increase parameters. You didn't increase data model gets better, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
That doesn't apply anymore because there's not much more data in the pre-training side. Yes, there's video and audio and image that has not been fully taken advantage of. So there's a lot more scaling. But a lot of people have taken transcripts of YouTube videos. And that gets you a lot of the data. It doesn't get you all the learning value out of the video and image data.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But there's still scaling to be done on pre-training. But this post-training world is where all the flops are going to be spent, right? The model is going to play with itself. It's going to self-play. It's going to do verifiable tasks. It's going to do computer use in sandboxes. It might even do like simulated robotics things, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like all of these things are going to be environments where compute is spent in quote unquote post-training. But I think it's going to be good. We're going to drop the post from post-training. It's going to be pre-training and it's going to be training, I think. At some point. Because for the bulk of the last few years, pre-training has dwarfed post-training.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But with these verifiable methods, especially ones that scale really potentially infinitely, like computer use and robotics, not just math and coding, where you can verify what's happening, those infinitely verifiable tasks, it seems you can spend as much compute as you want on them. Especially at the context length increase context.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I was like, huh?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
TPU is awesome, right? It's great. Google is... They're a bit more tepid on building data centers for some reason. They're building big data centers, don't get me wrong. And they actually have the biggest cluster. I was talking about NVIDIA clusters. They actually have the biggest cluster, period. But the way they do it is very interesting, right? They have two data center...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
super regions right in that the data center isn't physically like all of the gpus aren't physically on one site but they're like 30 miles from each other not gpus tpus right they have like in in iowa nebraska they have four data centers that are just like right next to each other why doesn't google flex its cluster size go to multi-data center training this is good images in there so i'll show you what i mean it's just uh semi-analysis multi-data center
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So this is an image of what a standard Google data center looks like. By the way, their data centers look very different than anyone else's data centers.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So if you see this image, in the center there are these big rectangular boxes. Those are where the actual chips are kept. And then if you scroll down a little bit further, you can see there's these water pipes, there's these chiller cooling towers in the top, and a bunch of diesel generators. The diesel generators are backup power.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
the data center itself is like look physically smaller than the water chillers, right? So the chips are actually easier to like keep together, but then like cooling all the water for the water cooling is very difficult, right? So Google has like a very advanced infrastructure that no one else has for the TPU.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And what they do is they've like stamped these data center, they've dumped a bunch of these data centers out in a few regions, right? So if you go a little bit further down, this is this is a Microsoft, this is an Arizona, this is where GPT five, quote, unquote, will be trained.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, if it doesn't exist already. But each of these data centers, I've shown a couple images of them. They're really closely co-located in the same region, Nebraska, Iowa. And then they also have a similar one in Ohio complex. And so these data centers are really close to each other. And what they've done is they've connected them super high bandwidth with fiber.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so these are just a bunch of data centers. And the point here is that Google has a very advanced infrastructure. very tightly connected in a small region. So Elon will always have the biggest cluster fully connected, right? Because it's all in one building, right? And he's completely right on that, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Google has the biggest cluster, but you have to spread over three sites and by a significant margin, but you have to go across multiple sites.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think there's a couple problems with it. It's like one, TPU has been a form of allowing search to be really freaking cheap and build models for that, right? And so like a big chunk of the search TPU purchases or big chunk of Google's purchases and usage, all of it is for internal workloads, right? Whether it be search, now Gemini, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
YouTube, all these different applications that they have, you know, ads. These are where all their TPUs are being spent and that's what they're hyper-focused on, right? And so there's certain like aspects of the architecture that are optimized for their use case that are not optimized elsewhere. Right. One simple one is like they've open sourced a Gemma model and they called it Gemma 7B. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then it's actually eight billion parameters because the vocabulary is so large. And the reason they made the vocabulary so large is because TPUs like matrix multiply unit is massive. Because that's what they've like sort of optimized for.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so they decided, oh, well, I'll just make the vocabulary large too, even though it makes no sense to do so on such a small model, because that fits on their hardware. So Gemma doesn't run as efficiently on a GPU as a Lama does, right? But vice versa, Lama doesn't run as efficiently on a TPU as a Gemma does. And so there's certain aspects of hardware software co-design.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So all their search models are their ranking and recommendation models. All these different models that are AI, but not like gen AI, have been hyper-optimized with TPUs forever. The software stack is super optimized, but all of this software stack has not been released publicly at all. Right. Very small portions of it. Jackson XLA have been.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But like the experience when you're inside of Google and you're training on TPUs as a researcher, you don't need to know anything about the hardware in many cases. Right. Like it's like pretty beautiful. But as soon as you step outside, they all go. A lot of them go back.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
they leave google and then they go back yeah yeah they're like they leave and they start a company because they have all these amazing research ideas and they're like wait infrastructure is hard software is hard and this is on gpus or if they try to use tpus same thing because they don't have access to all this code and so it's like how do you convince a company whose golden goose is search where they're making hundreds of billions of dollars from to start selling gpu or tpus uh which they used to only buy a couple billion of you know i think in 2023 they bought like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
um like a couple billion and now they're buying like 10 billion to 15 billion dollars worth but how do you convince them that they could they should just buy like twice as many and figure out how to sell them and make 30 billion dollars like who cares about making 30 billion dollars won't that 30 billion exceed actually the search profit eventually
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, like, you're always going to make more money on services than... Always. I mean, like, yeah. Like, to be clear, like, today people are spending a lot more on hardware than they are the services, right? Because the hardware front runs the service spend.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If there's no revenue for AI stuff or not enough revenue, then obviously, like, it's going to blow up, right? You know, people won't continue to spend on GPUs forever. And then NVIDIA is trying to move up the stack with, like, software that they're trying to sell and license and stuff, right? But... Google has never had that DNA of like, this is a product we should sell, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Google Cloud, which is a separate organization from the TPU team, which is a separate organization from the DeepMind team, which is a separate organization from the search team, right? There's a lot of bureaucracy here.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Technically TPU sits under infrastructure, which sits under Google Cloud, but like Google Cloud, like for like renting stuff and TPU architecture are very different goals, right? In hardware and software, like all of this, right? Like the Jaxx XLA teams do not serve Google's customers externally. Whereas NVIDIA's various CUDA teams for like things like Nickel serve external customers, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The internal teams like Jaxx and XLA and stuff, they more so serve DeepMind and Search, right? And so their customer is different. They're not building a product for them.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Google Cloud is third. Microsoft is the second biggest, but Amazon is the biggest, right? Yeah. And Microsoft deceptively sort of includes like Microsoft Office 365 and things like that, like some of these enterprise-wide licenses. So in reality, the gulf is even larger. Microsoft is still second though, right? Amazon is way bigger. Why? Because using AWS is better and easier.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And in many cases, it's cheaper. And it's first.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
AWS generates over 80% of Amazon's profit. I think over 90%. That's insane. The distribution centers are just like, one day we'll decide to make money from this. But they haven't yet, right? They make tiny little profit from it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think actually Google's interface is sometimes nice, but it's also like they don't care about anyone besides their top customers.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And like their customer service sucks and like they have a lot less like.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Amazon has always optimized for the small customer too, though, right? Obviously, they optimize a lot for the big customer, but when they started, they just would go to random Bay Area things and give out credits, right? Or just put in your credit card and use us, right? Back in the early days. So they've always... The business has grown with them, right? And Virgin.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So why does Amazon... Why is Snowflake all over Amazon? Because Snowflake, in the beginning, when Amazon didn't care about them... was still using Amazon, right? And then, of course, one day, Snowflake and Amazon has a super huge partnership. But, like, this is the case. Like, Amazon's user experience and quality is better.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Also, a lot of the silicon they've engineered makes them have a lower cost structure in traditional cloud storage, CPU, networking, that kind of stuff than... in databases, right? Like the, you know, I think like four of Amazon's top five revenue products, uh, margin products are like gross profit products or all database related products like redshift and like all these things, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like, um, so, so Amazon has a very like good Silicon to a user experience, like entire pipeline with AWS. I think Google they're in for their Silicon teams. Yeah. They have awesome Silicon internally, TPU, the YouTube chip, um, you know, some of these other chips that they've made and, uh, The problem is they're not serving external customers, they're serving internal customers, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I really don't think so. We went through a very long process of working with AMD on training on their GPUs, inference and stuff. And they're decent. Their hardware is better in many ways than NVIDIA's. The problem is their software is really bad. And I think they're getting better, right? They're getting better faster, but the gulf is so large.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And, like, they don't spend enough resources on it, or haven't historically, right? Maybe they're changing their tune now, but, you know, for multiple months, we were submitting the most bugs, right? Like, us, semi-analysis, right? Like, what the fuck? Like, why are we submitting the most bugs, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because they only cared about their, like, biggest customers, and so they'd ship them a private image, blah, blah, blah, and it's like, okay, but, like... I am just using PyTorch and I want to use the publicly available libraries and you don't care about that. Right. So they're getting better. But like, I think AMD is not possible.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Intel is obviously in dire straits right now and needs to be saved somehow. Very important for national security, for American technology.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Going back to earlier, only three companies can R&D, right? Taiwan, Hsinchu, Samsung, Pyongyang, and then Intel Hillsborough. Samsung's doing horribly. Intel's doing horribly. We could be in a world where there's only one company that can do R&D. And that one company already manufactures most of the chips. They've been gaining market share anyways, but like... That's a critical thing, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So what happens to Taiwan means the rest of the world's semiconductor industry and therefore tech relies on Taiwan, right? And that's obviously precarious. As far as like Intel, they've been slowly, steadily declining. They were on top of servers and PCs, but now Apple's done the M1 and NVIDIA is releasing a PC chip. And
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Qualcomm's releasing a PC chip and in servers hyperscalers are all making their own ARM based server chips and Intel has no AI silicon like wins right they have very small wins and they never got into mobile because they said no to the iPhone and like all these things have compounded and they've lost their process technology leadership right they were ahead for 20 years and now they're behind by at least a couple years right and they're trying to catch back up and we'll see if like their 18A 14A strategy works out where they try and leapfrog TSMC but like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And Intel is just like losing tons of money anyways, right? And they just fired their CEO, even though the CEO was the only person who understood the company well, right? We'll see. He was not the best, but he was pretty good, relatively, technical guy.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
PCs and data center CPUs, yeah. But data center CPUs are all going cloud. And Amazon, Microsoft, Google are making ARM-based CPUs. And then PC side, AMD's gained market share. NVIDIA's launching a chip. That's not going to be a success, right? MediaTek, Qualcomm have relaunched chips. Apple's doing well, right? Like they could get squeezed a little bit in PC.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Although PC generally, I imagine, will just stick Intel mostly for Windows side.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So accounting profit-wise, Microsoft is making money, but they're spending a lot of CapEx, right? And that gets depreciated over years. Meta is making tons of money, but with recommendation systems, which is AI, but not with Lama, right? Lama is losing money for sure, right? Yeah. I think Anthropic and OpenAI are obviously not making money because otherwise they wouldn't be raising money, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They have to raise money to build more, right? Although theoretically they are making money, right? Like, you know, you spent a few hundred million dollars on GPT-4 and it's doing billions in revenue. So like, obviously it's like making money. Although they had to continue to research to get the compute efficiency wins, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And move down the curve to get that 1200X that has been achieved for GPT-3. Maybe we're only at a couple hundred X now, but with GPT-4 Turbo and 4.0, and there'll be another one probably cheaper than GPT-4.0 even that comes out at some point.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, to do things like reasoning, right? Now that that exists, they're going to scale it, they're going to do a lot of research still. I think the... People focus on the payback question, but it's really easy to just be like, well, GDP is humans and industrial capital, right? And if you can make intelligence cheap, then you can grow a lot, right? That's the sort of dumb way to explain it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But that's sort of what basically the investment thesis is. Um, I think only Nvidia is actually making tons of money and other hardware vendors. Um, the hyperscalers are all on paper making money. Uh, but in reality, they're like spending a lot more on purchasing the GPUs, which you don't know if they're still going to make this much money on each GPU in two years. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, you don't know if, um, All of a sudden, OpenAI goes kapoof, and now Microsoft has hundreds of thousands of GPUs they were renting to OpenAI that they paid for themselves with their investment in them that no longer have a customer. This is always a possibility. I don't believe that. I think OpenAI will keep raising money.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think others will keep raising money because the returns from it are going to be eventually huge once we have AGI.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Rapidly increasing set of features.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Let's do a TAM analysis, right? 8 billion humans, and let's get 8 billion robots, right? And let's pay them the average salary, and yeah, there we go, $10 trillion. More than $10 trillion. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, of course, of course. I'm going to have like one robot. You're going to have like 20.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The chat application is clearly like does not have tons of room to continue, right? Like the standard chat, right? Where you're just using it for random questions and stuff, right? The cost continues to collapse. V3 is the latest one. It'll go down to ads. Biggest. But it's going to get supported by ads, right? Like, you know, Meta already serves 405B, probably loses the money.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But at some point, you know, they're going to get, the models are going to get so cheap that they can just serve them for free with ads supported, right? And that's what Google is going to be able to do. And that's obviously they've got a bigger reach, right? So chat is not going to be the only use case.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's like these reasoning, code, agents, computer use, all this stuff is where OpenAI has to actually go to make money in the future. Otherwise, they're kaputs.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Unless they're so good at models, which they are.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It depends on where you think AI capabilities are going.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The whole idea is human data is kind of tapped out. We don't care. We all care about self-play, verifiable data.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You have to believe that, and you know, there's a lot of discussions that tokens and tokenomics and LLM APIs are the next compute layer, or the next paradigm for the economy, kind of like energy and oil was. But there's also like, you have to sort of believe that APIs and chat are not where AI is stuck, right? It is actually just tasks and agents and robotics and computer use.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And those are the areas where all the value will be delivered, not API, not chat application.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If model progress is not rapid, yeah, it's becoming a commodity, right? DeepSeek v3 shows this, but also the GPT-3 chart earlier, Kurt chart showed this, right? Lama 3B is 1200x cheaper than GPT-3. Any GPT-3, like anyone whose business model was GPT-3 level capabilities is dead. Anyone whose business model is GPT-4 level capabilities is dead.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I don't think they care about that at all.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like Perplexity, Google, Meta care about this. I think OpenAI and Anthropic are purely laser focused on agents and AGI. And if I build AGI, I can make tons of money, right? Or I can pay for everything, right? And this is just predicated back on the export control thing, right? If you think AGI is five, 10 years away or less, right? These labs think it's two, three years away.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Obviously, your actions are – if you assume they're rational actors, which they are mostly, what you do in a two-year AGI versus five-year versus 10-year is very, very, very different, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think OpenAI's statement, I don't know if you've seen the five levels, right? Where it's chat is level one, reasoning is level two, and then agents is level three. And I think there's a couple more levels, but it's important to note, right? We were in chat for a couple of years, right? We just theoretically got to reasoning. We'll be here for a year or two, right? And then agents.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But at the same time, people can try and approximate capabilities of the next level. But the agents are doing things autonomously, doing things for minutes at a time, hours at a time, et cetera, right? Reasoning is doing things for... tens of seconds at a time, right? And then coming back with an output that I still need to verify and use and try to check out, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the biggest problem is, of course, like it's the same thing with manufacturing, right? Like there's the whole Six Sigma thing, right? Like, you know, how many nines do you get? And then you compound the nines onto each other. And it's like, if you multiply, you know, by the number of steps that are Six Sigma, you get to, you know, a yield or something, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So like in semiconductor manufacturing, tens of thousands of steps, 9999999 is not enough, right? Right? Because you multiply by that many times, you actually end up with like 60% yield, right? Yeah, or zero. Really low yield, yeah, or zero. And this is the same thing with agents, right? Like chaining tasks together each time.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
LLMs, even the best LLMs in particularly pretty good benchmarks don't get 100%, right? They get a little bit below that because there's a lot of noise. And so... How do you get to enough nines, right? This is the same thing with self-driving. We can't have self-driving because without it being like super geo-fenced like Google's, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And even then they have a bunch of teleoperators to make sure it doesn't get stuck, right? But you can't do that because it doesn't have enough nines.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There is a company, I don't remember what it is, but that's literally their pitch is, yeah, we're just going to be the human operator when agents fail. And you just call us and we fix it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Or like, they just like, here's an API and it's only exposed to AI agents. And if anyone queries it, the price is 10% higher and for any flight, but we'll let you see any of our flights and you can just book any of them.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then it's like, oh, and I made 10% higher price. Awesome. Yeah. And like, am I willing to say that for like, hey, book me a flight to see Lex, right? And it's like, yeah, whatever. Yeah. Yeah. Okay. I think computers and real world and the open world are really, really messy.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But if you start defining the problem in narrow regions, people are going to be able to create very, very productive things. And ratchet down cost massively, right? Like now crazy things like, you know, robotics in the home, you know, those are going to be a lot harder to do just like self-driving, right? Because there's just a billion different failure modes, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But, but like agents that can like navigate a certain set of websites and do certain sets of tasks. Or like, look at, you know, look at your, you know, take a photo of your groceries, your fridge and or like upload your recipes. And then like it figures out what to order from, you know, Amazon slash Whole Foods food delivery.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like that's then that's going to be like pretty quick and easy to do, I think. So it's going to be a whole range of like business outcomes. And it's going to be tons of tons of sort of optimism around people can just figure out ways to make money.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's a lot of fear and angst too from current CS students, but there's also, that's where, that is the area where probably the most AI revenue and productivity gains have come, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, whether it be co-pilots or cursor or, uh, what have you, right? This is, or just standard chat GPT, right? Like a lot of, I don't, I know very few programmers who don't have chat GPT and actually many of them have the $200 tier because that's what it's, it's so good for, right? Um, I think that in that world, uh, we already see it like sweep bench.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And if you've looked at the benchmark, uh, made by some Stanford students, uh, I wouldn't say it's like really hard, but I wouldn't say it's easy either. I think like it takes someone who's been through at least, you know, a few years of CS or a couple of years of programming to do sweep bench well. And the models went from 4% to 60% in like a year, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And where are they going to go to next year? You know, it's going to be higher. It probably won't be 100% because again, that nines is like really hard to do. But we're going to get to some point where that's, and then we're going to need harder software engineering benchmarks and so on and so forth. But the way that people think of it now is it can do code completion easy.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It can do some function generation and have to review it. Great. But really, the software engineering agents, I think, can be done faster, sooner than any other agent because it is a verifiable domain. You can always unit test or compile. And there's many different regions of... It can inspect the whole code base at once, which no engineer really can.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Only the architects can really think about this stuff, the really senior guys, and they can define stuff. And then the agent can execute on it. So I think software engineering costs are going to plummet like crazy. And one interesting aspect of that is when software engineering costs are really low, you get very different markets, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So in the US, you have all these platform SaaS companies, right? Salesforce and so on and so forth, right? In China, no one uses platform SaaS. everyone just builds their own stack because software engineering is much cheaper in China. Uh, partially because like people, STEM, number of STEM graduates, et cetera. Uh, so STEM is, so it's generally just cheaper to do. Um,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so at the same time, code LLMs have been adopted much less in China because the cost of an engineer there is much lower. But what happens when every company can just invent their own business logic really cheaply and quickly? You stop using platform SaaS, you start building custom tailored solutions, you change them really quickly.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now, all of a sudden, your business is a little bit more efficient too, potentially, because you're not dealing with the hell that is some random platform SaaS company stuff not working perfectly and having to adjust workflows or random business automation cases that aren't necessarily AI required. It's just logic that needs to be built that no one has built, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
All of these things can go happen faster. And so I think software... And then the other domain is like industrial, chemical, mechanical engineers suck at coding, right? Just generally. And like their tools, like semiconductor engineers, their tools are 20 years old. All the tools run on XP, including ASML lithography tools run on Windows XP, right? It's like, you know, and like...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
A lot of the analysis happens in Excel, right? Like, it's just like, guys, like you guys can move 20 years forward with all the data you have and gathered and like do a lot better. It's just you need the engineering skills for software engineering to be delivered to the actual domain expert engineer. So I think that's the area where I'm like super duper bullish of generally AI creating value.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Thinking larger than the context length.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think it's that and then becoming a domain expert in something. Sure, yeah. Seriously, if you go look at aerospace or semiconductors or chemical engineering, everyone is using really crappy platforms, really old software. The job of a data scientist is a joke in many cases. In many cases, it's very real, but it's like,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
bring what the forefront of human capabilities are to your domain and like even if the forefront is like from the AI your domain you're like at the forefront right so it's like it's like you have to be at the forefront of something and then leverage the the like rising tide that is AI for everything else oh yeah there's so many low-hanging fruit
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's like an amalgamation of multiple benchmarks or what do you mean?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Oh, so you beat them even ignoring safety.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Isn't Meta's license pretty much permissible except for five companies?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Wait, so you're saying I can't make a cheap copy of Lama and pretend it's mine, but I can do this with the Chinese model?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, so I think Stargate is an opaque thing. It definitely doesn't have $500 billion. It doesn't even have $100 billion, right? So what they announced is this $500 billion number, Larry Ellison, Sam Altman, and Trump said it. They thanked Trump, and Trump did do some executive actions that do significantly improve the ability for this to be built faster. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, one of the executive actions he did is on federal land, you can just basically build data centers in power, you know, like pretty much like that. And then the permitting process is basically gone or you file after the fact. So like one of the, again, like I had a schizo take earlier, another schizo take. If you've ever been to the Presidio in San Francisco, beautiful area.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You could build a power plant in a data center there if you wanted to. Because it is federal land. It used to be a military base. It is. Obviously, this would piss people off. It's a good bit. Anyways, Trump has made it much easier to do this, right, generally. Texas has the only unregulated grid in the nation as well.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so, therefore, ERCOT enables people to build faster as well. In addition, the federal regulations are coming down. And so, Stargate is predicated, and this is why that whole show happened. Now, how they came up with a $500 billion number is beyond me. How they came up with a $100 billion number makes sense to some extent, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And there's actually a good table in here that I would like to show in that Stargate piece that I had. It's the most recent one, yeah. So anyways, Stargate, you know, it's basically, right, like there is, it's a table about cost. There, you passed it already. It's that one. So this table is kind of explaining what happens. So Stargate is in Abilene, Texas, the first $100 billion of it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
That site is 2.2 gigawatts of power in, about 1.8 gigawatts of power consumed. Per GPU, they have roughly... Oracle is already building the first part of this before Stargate came about. To be clear, they've been building it for a year. They tried to rent it to Elon, in fact. But Elon was like, it's too slow. I need it faster. So then he went and did his Memphis thing.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so OpenAI was able to get it with this weird joint venture called Stargate. They initially signed a deal with just Oracle for the first section of this cluster. This first section of this cluster is roughly $5 billion to $6 billion of server spend. And then there's another billion or so of data center spend.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then likewise, if you fill out that entire 1.8 gigawatts with the next two generations of NVIDIA's chips, GB200, GB300, VR200, and you fill it out completely, that ends up being roughly $50 billion server cost. Plus there's data center costs, plus maintenance costs, plus operation costs, plus all these things. And that's where OpenAI gets to their $100 billion announcement that they had.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because they talked about $100 billion is phase one. That's this Abilene, Texas data center. $100 billion of total cost of ownership, quote unquote. So it's not CapEx. It's not investment. It's $100 billion of total cost of ownership. And then, and then there will be future phases. They're looking at other sites that are even bigger than this 2.2 gigawatts, by the way, uh, in Texas and elsewhere.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, and so they're, they're not, you know, completely ignoring that, but there is, there is the number of a hundred billion dollars that they say is for phase one, uh, which I do think will happen. They don't even have the money for that. Um, furthermore, it's not a hundred billion dollars. It's $50 billion of spend, right. And then like $50 billion of operational cost, power, et cetera. Um,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
rental pricing, et cetera, because they're renting it, OpenAI is renting the GPUs from the Stargate joint venture, right? What money do they actually have, right? SoftBank, SoftBank is going to invest, Oracle is going to invest, OpenAI is going to invest. OpenAI is on the line for $19 billion. Everyone knows that they've only got $6 billion in their last round and $4 billion in debt. Mm-hmm.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But there is news of SoftBank maybe investing $25 billion into OpenAI. So that's part of it. So $19 billion can come from there. So OpenAI does not have the money at all, to be clear. Ink is not dried on anything.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
OpenAI has $0 for this $50 billion, in which they're legally obligated to put $19 billion of CapEx into the joint venture, and then the rest they're going to pay via renting the GPUs from the joint venture. And then there's Oracle. Oracle has a lot of money. They're building the first section completely. They were spending for it themselves, right? This $6 billion of CapEx, $10 billion of TCO.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they were going to do that first section. They're paying for that, right? As far as the rest of the section, I don't know how much Larry wants to spend, right? At any point, he could pull out, right? This is, again, completely voluntary. So at any point, there's no signed ink on this, right? But he potentially could contribute tens of billions of dollars, right? To be clear, he's got the money.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Oracle's got the money. And then there's MGX, which is the UAE fund, which technically has $1.5 trillion for investing in AI. But again, I don't know how real that money is. And whereas there is no ink signed for this, SoftBank does not have $25 billion of cash. They have to sell down their stake in ARM. which is the leader in CPUs, and they IPO'd it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is obviously what they've always wanted to do. They just didn't know where they'd redeploy the capital. Selling down the stake in ARM makes a ton of sense. So they can sell that down and invest in this if they want to and invest in OpenAI if they want to. As far as money secured, the first 100,000 GB200 cluster can be funded. Everything else after that is up in the air. Money's coming.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I believe the money will come. I personally do. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's a belief that they are going to release better models and be able to raise more money. But the actual reality is that Elon's right. The money does not exist.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Trump is reducing the regulation so they can build it faster. Right. And he's allowing them to do it. Right. You know, because any investment of this side is going to involve like antitrust stuff. Right. Like so obviously he's going to he's going to allow them to do it. He's going to enable the regulations to actually allow it to be built. I don't believe there's any U.S.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
government dollars being spent on this, though.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so like, we've had this 1.8 gigawatt data center in our data for over a year now. And we've been like sort of sending it to all of our clients, including many of these companies that are building the multi gigawatts. But that is like at a level that's not quite maybe executives like seeing $500 billion, $100 billion. And then everyone's asking them like,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So it could spur like another like an even faster arms race. Right. Because there's already an arms race. But like this, this like 100 billion, 500 billion dollar number, Trump talking about it on TV, like it could spur the arm race to be even faster and more investors to flood in and et cetera, et cetera. So I think I think you're right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Is that in that sense that open eye or sort of Trump is sort of like championing people are going to build more and his actions are going to let people build more.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I really enjoy tracking supply chain and like who's involved in what. I really do. It's really fun to see like the numbers, the cost, who's building what capacity, helping them figure out how much capacity they should build, winning deals, strategic stuff. That's really cool. I think technologically, there's a lot around the networking side that really excites me with optics and electronics, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like kind of getting closer and closer, whether it be co-package optics or some sort of like forms of new forms of switching, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. Also multi data center training, right? Like there's people are putting so much fiber between these data centers and lighting it up with so many different, you know, with so much bandwidth that there's a lot of interesting stuff happening on that end, right? Telecom has been really boring since 5G. And now it's like really exciting again.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
No, I don't think that's possible. It's only going to get harder to program, not easier. It's only going to get more difficult and complicated and more layers, right? The general image that people like to have is like this hierarchy of memory. So on chip is really close, localized within the chip, right? You have registers, right? And those are shared between some compute elements.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then you'll have
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
caches which are shared between more compute elements then you have like memory right like hbm or dram like ddr memory or whatever it is and that's shared between the whole chip and then you can have you know pools of memory that are shared between many chips right and then storage and it keep you keep zoning out right the access latency across data centers across within the data center within a chip is different so like you're obviously always you're always going to have different programming paradigms for this it's not going to be easy programming this stuff is going to be hard maybe i can help right
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
um, you know, with programming this, but the, the, the way to think about it is that like, there is, there, there's sort of like the more elements you add to a task, you, you don't gain, you don't get strong scaling, right? If I double the number of chips, I don't get two X the performance, right? This is just like a reality of computing, uh, cause there's inefficiencies.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, and there's a lot of interesting work being done to make it not You know, to make it more linear, whether it's making the chips more networked together more tightly or, you know, cool programming models or cool algorithmic things that you can do on the model side. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
DeepSeq did some of these really cool innovations because they were limited on interconnect, but they still needed to parallelize. Right. Like all sorts of, you know, everyone's always doing stuff. Google's got a bunch of work and everyone's got a bunch of work about this. That stuff is super exciting on the model and workload and innovation side. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Hardware, solid state transformers are interesting for the power side. There's all sorts of stuff on batteries and stuff.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's all sorts of stuff on, you know, I think when you look at, if you look at every layer of the compute stack, right, whether it goes from lithography and etch all the way to like fabrication, to like optics, to networking, to power, to transformers, to cooling, to, you know, networking, and you just go on up and up and up and up the stack, you know, even air conditioners for data centers are like innovating.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
right? Like it's like, there's like copper cables are innovating, right? Like you wouldn't think it, but copper cables, like are, there's some innovations happening there with like the density of how you can pack them. And like, it's like all of these layers of the stack, all the way up to the models, human progress is at a pace that's never been seen before.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's a big team.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, thank you.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Generally, humanity is going to suffer a lot less, right? I'm very optimistic about that. I do worry of techno-fascism type stuff arising as AI becomes more and more prevalent and powerful, and those who control it can do more and more. Maybe it doesn't kill us all, but at some point, every very powerful human is going to want a brain-computer interface so that they can interact with the AGI system
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
and all of its advantages in many more way and merge its mind with, you know, sort of like, and its capabilities or that person's capabilities, uh, can leverage those much better than anyone else and therefore be, you know, it won't be one person rule them all, but it will be, uh, you know, the thing I worry about is it'll be like few people, you know, you know, hundreds, thousands, tens of thousands, maybe millions of people rule whoever's left.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Right. Um, And the economy around it, right? And I think that's the thing that's probably more worrisome is human-machine amalgamations. This enables an individual human to have more impact on the world, and that impact can be both positive and negative, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Generally, humans have positive impacts on the world, at least societally, but it's possible for individual humans to have such negative impacts. And AGI, at least as I think the labs define it, which is not a runaway sentient thing, but rather just something that can do a lot of tasks really efficiently, amplifies the capabilities of someone causing extreme damage.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But for the most part, I think it'll be used for profit-seeking motives, which will increase the abundance and supply of things and therefore reduce suffering, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
That is a positive outcome, right? It's like, if I have food tubes and laptops scrolling and I'm happy, that's a positive outcome.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. So there's two main techniques that they implemented that are probably the majority of their efficiency. And then there's a lot of implementation details that maybe we'll gloss over or get into later that sort of contribute to it. But those two main things are, one is they went to a mixture of experts model, which we'll define in a second.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then the other thing is that they invented this new technique called MLA latent attention. Both of these are big deals. Mixture of experts is something that's been in the literature for a handful of years. And OpenAI with GPT-4 was the first one to productize a mixture of experts model.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And what this means is when you look at the common models around that most people have been able to interact with that are open, right? Think LAMA. LAMA is a dense model. i.e. every single parameter or neuron is activated as you're going through the model for every single token you generate, right? Now, with a mixture of experts model, you don't do that, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
How does the human actually work, right? It's like, oh, well, my visual cortex is active when I'm thinking about, you know, vision tasks and like, you know, other things, right? My amygdala is when I'm scared, right? These different aspects of your brain are focused on different things. A mixture of experts model attempts to approximate this to some extent.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's nowhere close to what a brain architecture is, but different portions of the model activate, right? You'll have a set number of experts in the model and a set number that are activated each time. And this dramatically reduces both your training and inference costs.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because now if you think about the parameter count as the sort of total embedding space for all of this knowledge that you're compressing down during training, When you're embedding this data in, instead of having to activate every single parameter every single time you're training or running inference, now you can just activate a subset.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the model will learn which expert to route to for different tasks. And so this is a humongous innovation in terms of, hey, I can continue to grow the total embedding space of parameters. And so DeepSeq's model is, you know, 600 something billion parameters, right? Relative to LAMA-405b, it's 405 billion parameters, right? Relative to LAMA-70b, it's 70 billion parameters, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So this model technically has more embedding space for information, right? To compress all of the world's knowledge that's on the internet down. But at the same time, it is only activating around 37 billion of the parameters. So only 37 billion of these parameters actually need to be computed every single time you're training data or inferencing data out of it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so versus, again, the LAMA model, 70 billion parameters must be activated or 405 billion parameters must be activated. So you've dramatically reduced your compute cost when you're doing training and inference. with this mixture of experts architecture.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Effectively, NVIDIA builds this library called Nickel, right? In which, you know, when you're training a model, you have all these communications between every single layer of the model, and you may have over 100 layers. What does Nickel stand for? It's NCCL? NVIDIA Communications Collectives Library. Nice. And so...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
When you're training a model, you're going to have all these all-reduces and all-gathers. Between each layer, between the multi-layer perceptron or feed-forward network and the attention mechanism, you'll have basically the model synchronized. Or you'll have all-reducer and all-gather. And this is a communication between all the GPUs in the network, whether it's in training or inference.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So NVIDIA has a standard library. This is one of the reasons why it's really difficult to use anyone else's hardware. for training is because no one's really built a standard communications library. And NVIDIA has done this at a sort of a higher level, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
DeepSeq, because they have certain limitations around the GPUs that they have access to, the interconnects are limited to some extent by the restrictions of the GPUs that were shipped into China legally, not the ones that are smuggled, but legally shipped in that they use to train this model. They had to figure out how to get efficiencies,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And one of those things is that instead of just calling the NVIDIA library nickel, they instead scheduled their own communications, which some of the labs do. Emeta talked about in Lama 3 how they made their own custom version of nickel. They didn't talk about the implementation details. This is some of what they did.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Probably not as well as, maybe not as well as DeepSeq because DeepSeq, necessity is the mother of innovation and they had to do this. Whereas in the case, you know, OpenAI has people that do this sort of stuff, Anthropic, et cetera.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But, you know, DeepSeek certainly did it publicly and they may have done it even better because they were gimped on a certain aspect of the chips that they have access to. And so they scheduled communications, you know, by scheduling specific SMs. SMs you could think of as like the core on a GPU. Right. So there's hundreds of cores or there's, you know, a bit over 100 cores, SMs on a GPU.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they were specifically scheduling, hey, which ones are running the model, which ones are doing all reduce, which one are doing all gather. Right. And they would flip back and forth between them. And this requires extremely low level programming. This is what Nickel does automatically or other NVIDIA libraries handle this automatically, usually. Yeah, exactly.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so technically they're using, you know, PTX, which is like sort of like you could think of it as like an assembly type language. It's not exactly that or instruction set, right? Like coding directly to assembly instruction set. It's not exactly that, but that's still part of technically CUDA. But it's like, do I want to write in Python, you know, PyTorch equivalent and call NVIDIA libraries?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Do I want to go down to the C level? right? Or, you know, encode even lower level? Or do I want to go all the way down to the assembly or ISO level? And, and there are cases where you go all the way down there at the very big labs, but most companies just do not do that, right? Because it's a waste of time. And the efficiency gains you get are not worth it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But deep seeks implementation is so complex, right? Especially with their mixture of experts, right? People have done mixture of experts, but they're generally eight, 16 experts, right? And they activate two. So, you know, one of the words that we like to use is like sparsity factor, right? Or usage, right? So you might have four, you know, one fourth of your model activate, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that's what Mistral's, Mistral model, right? Their model that really catapulted them to like, oh my God, they're really, really good. OpenAI has also had models that are MOE and so have all the other labs that are major closed. But But what DeepSeq did that maybe only the leading labs have only just started recently doing is have such a high sparsity factor, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's not one fourth of the model, right? Two out of eight experts activating every time you go through the model, it's eight out of 256.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Going back to sort of the like efficiency and complexity point, right? It's 32 versus four, right? For like mixed draw and other MOE models that have been publicly released. So this ratio is extremely high. And sort of what Nathan was getting at there was when you have such a different level of sparsity, you can't just have every GPU have the entire model, right? The model's too big.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's too much complexity there. So you have to split up the model. um, with different types of parallelism. Right. And so you might have different experts on different GPU nodes, but now what, what happens when a, you know, this set of data that you get, Hey, all of it looks like this one way and all of it should route to one part of my, you know, model. Right. Um,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
When all of it routes to one part of the model, then you can have this overloading of a certain set of the GPU resources or a certain set of the GPUs, and then the rest of the training network sits idle because all of the tokens are just routing to that. This is one of the biggest complexities with running a very sparse mixture of experts model.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I, you know, this 32 ratio versus this four ratio is that you end up with so many of the experts just sitting there idle. So how do I load balance between them? How do I schedule the communications between them? This is a lot of the like extremely low level detailed work that they figured out in the public first and potentially like second or third in the world and maybe even first in some cases.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think there is one aspect to note, though, right? Is that there is the general ability for that to transfer across different types of runs, right? You may make really, really high quality code for one specific model architecture at one size. And then that is not transferable to, hey, when I make this architecture tweak, everything's broken again, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like that's something that could be, you know, with their specific low level coding of like scheduling SMs is specific to this model architecture and size. Right. And whereas like NVIDIA's collectives library is more like, hey, it'll work for anything. Right. You want to do an all reduce? Great. I don't care what your model architecture is. It'll work.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And you're giving up a lot of performance when you do that in many cases. But it's worthwhile for them to do the specific optimization for the specific run, given the constraints that they have regarding compute.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
When people are training, they have all these various dashboards, but like the most simple one is your loss, right? And it continues to go down. But in reality, especially with more complicated stuff like MOE, the biggest problem with it or FP8 training, which is another innovation, you know, going to a lower precision number format, i.e. less accurate, is that you end up with loss spikes.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And no one knows why the lost spike happened.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. These people are like, you know, you'll go out to dinner with like a friend that works at one of these labs and they'll just be like looking at their phone every like 10 minutes. And they're not like, you know, it's one thing if they're texting, but they're just like, like, is the loss. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And some level of spikes is normal, right? It'll recover and be back. Sometimes a lot of the old strategy was like, you just stop the run, restart from the old version, and then like change the data mix. And then it keeps going.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So it's like there's a distribution. The whole idea of grokking also comes in, right? It's like just because it slowed down from improving and loss doesn't mean it's not learning because all of a sudden it could be like this and it could just spike down and loss again because it learned, truly learned something, right? And it took some time for it to learn that.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's not like a gradual process, right? And that's what humans are like. That's what models are like. So it's really a stressful task, as you mentioned.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's the concept of a YOLO run. So YOLO, you only live once. And what it is, is like, you know, there's all this experimentation you do at the small scale, right? Research ablations, right? Like you have your Jupyter notebook where you're experimenting with MLA on like three GPUs or whatever. And you're doing all these different
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
uh things like hey do i do four expert four active experts 128 experts do i arrange the experts this way you know all these different uh model architecture things you're testing at a very small scale right couple researchers few gpus tens of gpus hundreds of gpus whatever it is and then all of a sudden you're like okay guys no more no more fucking around right uh no more screwing around everyone take all the resources we have let's pick what we think will work and just go for it right yolo
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And this is where that sort of stress comes in as like, well, I know it works here, but some things that work here don't work here. And some things that work here don't work down here, right? In terms of scale, right? So it's really truly a YOLO run.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And sort of like there is this like discussion of like certain researchers just have like this methodical nature, like they can find the whole search space and like figure out all the ablations of different research and really see what is best. And there's certain researchers who just kind of like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The search space is near infinite, right? And yet the amount of compute and time you have is very low. And you have to hit release schedules. You have to not get blown past by everyone. Otherwise, you know, what happened with DeepSeek, you know, crushing Meta and Mistral and Cohere and all these guys, they moved too slow, right? They maybe were too methodical. I don't know.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They didn't hit the YOLO run, whatever the reason was. Maybe they weren't as skilled. You can call it luck if you want, but at the end of the day, it's skill.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think it's even more impressive what OpenAI did in 2022. At the time, no one believed in mixture of experts models at Google, who had all the researchers. OpenAI had such little compute. And they devoted all of their compute for many months, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
All of it, 100% for many months to GPT-4 with a brand new architecture with no belief that, hey, let me spend a couple hundred million dollars, which is all of the money I have on this model, right? That is truly YOLO, right? Now, you know, people are like, all these like training run failures that are in the media, right? It's like, okay, great.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But like, actually a lot, a huge chunk of my GPs are doing inference. I still have a bunch doing research constantly. And yes, my biggest cluster is training, but like on, on this YOLO run, but like that YOLO run is much less risky than like what opening I did in 2022 or maybe what deep seek did now, or, you know, like sort of like, Hey, we're just going to throw everything at it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
DeepSeq is very interesting. This is where it's second to take us to zoom out out of who they are, first of all, right? High Flyer is a hedge fund that has historically done quantitative trading in China as well as elsewhere. And they have always had a significant number of GPUs, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
In the past, a lot of these high frequency trading algorithmic quant traders used FPGAs, but it shifted to GPUs definitely. And there's both, right? But GPUs especially and High Flyer, which is the hedge fund that owns DeepSeek and everyone who works for DeepSeek is part of High Flyer to some extent, right? Same parent company, same owner, same CEO.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They had all these resources and infrastructure for trading, and then they devoted a humongous portion of them to training models, both language models and otherwise, right? Because these techniques were heavily AI-influenced.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
More recently, people have realized, hey, trading with... Even when you go back to Renaissance and all these quantitative firms, natural language processing is the key to trading really fast, understanding a press release and making the right trade. And so DeepSeek has always been really good at this.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And even as far back as 2021, they have press releases and papers saying, hey, we're the first company in China with an A100 cluster this large. It was 10,000 A100 GPUs, right? This is in 2021. Now, this wasn't all for training, you know, large language models.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This was mostly for training models for their quantitative aspects, their quantitative trading, as well as, you know, a lot of that was natural language processing, to be clear, right? And so this is the sort of history, right? So verifiable fact is that in 2021, they built the largest Chinese cluster. At least they claim it was the largest cluster in China, 10,000 GPUs.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. It's like they've had a huge cluster before any conversation of export controls. So then you step it forward to like, what have they done over the last four years since then, right? Obviously, they've continued to operate the hedge fund, probably make tons of money. And the other thing is that they've leaned more and more and more into AI.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The CEO, Liang Qingfeng, Liang... You're not putting me spot on this. We discussed this before. Yeah. Leon Feng, the CEO, he owns maybe a little bit more than half the company, allegedly, is an extremely Elon Jensen kind of figure where he's just involved in everything. And so over that time period, he's gotten really in-depth into AI. He actually has a bit of a...
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If you see some of the statements, a bit of an EAC vibe almost, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so this is sort of like the quote-unquote visionary behind the company, right? This hedge fund still exists, right? This quantitative firm. And so... DeepSeek is the sort of, you know, slowly he got turned to this full view of like AI, everything about this, right? But at some point it slowly maneuvered and he made DeepSeek. And DeepSeek has done multiple models since then.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They've acquired more and more GPUs. They share infrastructure with the fund. Right. And so, you know, there is no exact number of public GPU resources that they have. But besides this 10,000 GPUs that they bought in 2021. Right. And they were fantastically profitable. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then this paper claims they did only 2,000 H800 GPUs, which are a restricted GPU that was previously allowed in China, but no longer allowed. And there's a new version. But it's basically NVIDIA's H100 for China.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
right um and there's some restrictions on it specifically around the communications uh sort of uh speed the interconnect speed right which is why they had to do this crazy sm you know scheduling stuff right so going back to that right looks like this is obviously not true in terms of their total gpu count obvious available gpus but for this training run you think 2000 is the correct number or no so this is where it takes um you know a significant amount of sort of like zoning in right like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
What do you call your training run, right? Do you count all of the research and ablations that you ran, right? Picking all this stuff, because yes, you can do a YOLO run, but at some level, you have to do the test at the small scale, and then you have to do some test at medium scale before you go to a large scale.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, and research begets the new ideas that let you get huge efficiency.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So the numbers that DeepSeq specifically said publicly, right, are just the 10,000 GPUs in 2021 and then 2,000 GPUs for only the pre-training for V3. They did not discuss cost on R1. They did not discuss cost on all the other RL, right, for the instructive model that they made, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They only discussed the pre-training for the base model and they did not discuss anything on research and ablations. And they do not talk about any of the resources that are shared in terms of, hey, the fund is using all these GPUs, right? And we know that they're very profitable and that 10,000 GPUs in 2021.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So some of the research that we've found is that we actually believe they have closer to 50,000 GPUs.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, sorry. We believe they actually have something closer to 50,000 GPUs, right? Now, this is split across many tasks, right? Again, the fund.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Right. So like Llama 3, they trained on 16,000 H100s, right? But the company of Meta last year publicly disclosed they bought like 400 something thousand GPUs. Yeah. Right. So of course, tiny percentage on the training. Again, like most of it is like serving me the best Instagram reels, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, so there's, you know, Ampere was the A100 and then H100 Hopper, right? People use them synonymously in the US because really there's just H100 and now there's H200, right? But same thing, mostly. In China, there've been different salvos of export restrictions. So initially the US government limited on a two-factor scale, right? Which is chip interconnect versus flops, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So any chip that had interconnects above a certain level and flops above a certain floating point operations above a certain level was restricted. Later, the government realized that this was a flaw in the restriction and they cut it down to just floating point operations. And so H-800 had high flops, low communication? Exactly. So the H-800 was the same performance as H-100 on flops.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But it just had the interconnect bandwidth cut. DeepSeq knew how to utilize this. Hey, even though we're cut back on the interconnect, we can do all this fancy stuff to figure out how to use the GPU fully anyways. And so that was back in October 2022. But later in 2023, end of 2023, implemented in 2024, the US government banned the H800, right? Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so, by the way, this H800 cluster, these 2,000 GPUs, was not even purchased in 2024, right? It was purchased in late 2023. And they're just getting the model out now, right, because it takes a lot of research, et cetera. H800 was banned, and now there's a new chip called the H20. The H20 is cut back on only flops, but the interconnect bandwidth is the same.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And in fact, in some ways, it's better than the H100 because it has better memory bandwidth and memory capacity. So there are, you know, NVIDIA is working within the constraints of what the government says and then builds the best possible GPU for China.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
To some extent, training a model does effectively nothing, right? Yeah. The thing that Dario is sort of speaking to is the implementation of that model once trained to then create huge economic growth, huge increases in military capabilities, huge increases in productivity of people, betterment of lives, whatever you want to direct super powerful AI towards, you can't.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But that requires significant amounts of compute, right? And so the US government has effectively said, And forever, right? Training will always be a portion of the total compute. We mentioned Meta's 400,000 GPUs, only 16,000 made Lama, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So the percentage that Meta's dedicating to inference, now this might be for recommendation systems that are trying to hack our mind into spending more time and watching more ads, or if it's for a super powerful AI that's doing productive things, doesn't matter about the exact use that our economic system decides, it's that that can be delivered in whatever way we want. Whereas with China, right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, you're, you know, expert restrictions. Great. You're never going to be able to cut everything off. Right. And that's, that's like, I think that's quite well understood by the U S government is that you can't cut everything off. You know, they'll make their own chips and they're trying to make their own chips. They'll be worse than ours.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But you know, this is the whole point is to just keep a gap. Right. And therefore, at some point as the AI, you know, in a world where two, three percent economic growth, this is really dumb, by the way, right, to cut off, you know, high tech and not make money off of it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But in a world where super powerful AI comes about and then starts creating significant changes in society, which is what all the AI leaders and big tech companies believe, I think super powerful AI is going to change society massively. And therefore, this compounding effect of the difference in compute is really important. There's some sci-fi out there where like
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
ai is is like measured in the power of in like how much power is delivered to compute right or how much uh is being you know that's sort of a way of thinking about what's the economic output is just how much power you directing towards that ai should we talk about reasoning models with this as a way that this might be actionable as something that people can actually see so the reasoning models that are coming out with r1 and o1 they're designed to use more compute there's a lot of
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So I'm not as... That's what I would say. So define that, right? Because to me, it kind of almost has already happened, right? You look at elections in India and Pakistan, people get AI voice calls and think they're talking to the politician, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The AI diffusion rules, which was enacted in the last couple of weeks of the Biden admin and looks like the Trump admin will keep and potentially even strengthen, limit cloud computing and GPU sales to countries that are not even related to China. Portugal and all these normal countries are on the you need approval from the US list.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like, yeah, Portugal and like, you know, like, like all these countries that are allies, right? Singapore, right? Like they, they freaking have F-35s and we don't let them buy GPUs. Like this is, this to me is already to the scale of like, you know.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, there's tons of talk about, you know, from the 2016 elections, like Cambridge Analytica and all this stuff, Russian influence. I mean, every country in the world is pushing stuff onto the internet and has narratives they want, right? Like that's every, every like technically competent, whether it's Russia, China, US, Israel, et cetera, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, people are pushing viewpoints onto the internet en masse and language models crash the cost of like very intelligent sounding language.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think to some extent we have capabilities that hit a certain point where any one person could say, oh, okay, if I can leverage those capabilities for X amount of time, this is AGI, right? Call it 27, 28.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then the cost of actually operating that capability is so, so extreme that no one can actually deploy it at scale and mass to actually completely revolutionize the economy on a snap of a finger. So I don't think it will be like a snap of the fingerprint.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
physical constraint rather it'll be a you know oh the capabilities are here but i can't deploy it everywhere right and so one one simple example going back sort of to 2023 was when uh you know being with gpt4 came out and everyone was freaking out about search right Perplexity came out.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If you did the cost on like, hey, implementing GPT-3 into every Google search, it was like, oh, okay, this is just like physically impossible to implement, right? And as we step forward to like going back to the test time compute thing, right? A query for, you know, you ask ChatGPT a question, it costs cents, right? For their most capable model of chat, right? To get a query back.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
To solve an Arc AGI problem, though... cost five to 20 bucks, right? And this is- It's only going up from there. This is a thousand, 10,000 X factor difference in cost to respond to a query versus do a task. And the task of Arc AGI, it's not like it's like, it's simple to some extent, but it's also like, What are the tasks that we want? Okay, AGI, quote unquote, what we have today can do Arc AGI.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Three years from now, it can do much more complicated problems, but the cost is going to be measured in thousands and thousands and hundreds of thousands of dollars of GPU time, and there just won't be enough power, GPUs, infrastructure to operate this and therefore shift everything in the world on the snap of the finger. But at that moment, who gets to control and point the AGI at a task?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so this was in Dario's post that he's like, hey, China can effectively and more quickly than us point their AGI at military tasks, right? And they have been in many ways faster at adopting certain new technologies into their military, right? Especially with regards to drones, right? The U.S. maybe has a longstanding, you know, large air sort of, you know, fighter jet type of thing, bombers.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But when it comes to asymmetric arms such as drones, they've completely leapfrogged the U.S. and the West. And the fear that Dario is sort of pointing out there, I think, is that Yeah, great. We'll have AGI in the commercial sector. The US military won't be able to implement it super fast.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Chinese military could, and they could direct all their resources to implementing it in the military and therefore solving military logistics or solving some other aspect of disinformation for targeted certain set of people so they can flip a country's politics or something like that that is actually catastrophic.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
versus, you know, the US just wants to, you know, because it'll be more capitalistically allocated just towards whatever is the highest return on income, which might be like building, you know, factories better or whatever.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And I think going back to my viewpoint is if you believe we're in this sort of stage of economic growth and change that we've been in for the last 20 years, the export controls are absolutely guaranteeing that China will win long term, right? If you do not believe AI is going to make significant changes to society in the next 10 years or five years.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Five-year timelines are sort of what the more executives and such of AI companies and even big tech companies believe. But even 10-year timelines, it's reasonable. But once you get to, hey, these timelines are below that time period, then the only way to create a sizable advantage or disadvantage for America versus China is if you constrain compute. Because
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Talent is not really something that's constraining, right? China arguably has more talent, right? More STEM graduates, more programmers. The US can draw upon the world's people, which it does. There's tons of foreigners in the AI industry.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. I mean, many of them are Chinese people who are moving to America, right? And that's great. That's There's that talent is one aspect, but I don't think that's one that is a measurable advantage for the US or not.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It truly is just whether or not compute right now, even on the compute side, when we look at chips versus data centers, right, China has the unprecedented ability to build ridiculous sums of power.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
clockwork right they're always building more and more power they've got steel mills that that like individually are the size of the entire u.s industry right and they've got aluminum mills that consume gigawatts and gigawatts of power right and when we talk about what's the biggest data center right opening i made this huge thing about stargate their announcement there that's not that's like once it's fully built out in a few years it'll be two gigawatts right of power
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And this is still smaller than the largest industrial facilities in China. China, if they wanted to build the largest data center in the world, if they had access to the chips, could. So it's just a question of when, not if.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Chips are a little bit more specialized. I'm specifically referring to the data centers. Chips, fabs take huge amounts of power. Don't get me wrong. That's not necessarily the gating factor there. The gating factor on how fast people can build the largest clusters today in the US is power.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now, it could be power generation, power transmission, substations, and all these sorts of transformers and all these things. building the data center. These are all constraints on the US industry's ability to build larger and larger training systems, as well as deploying more and more inference compute.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Less than 10 years or five years to above, right? China will win because of these restrictions long term unless AI does something in the short term, which I believe AI will do, you know, make massive changes to society in the medium short term, right? And so that's the big unlocker there. And even today, right, if Xi Jinping decided to get, you know, quote unquote, scale-pilled, right, i.e.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
decide that scaling laws are what matters, right, just like the U.S. executives like Satya Nadella and Mark Zuckerberg and Sundar and all these U.S.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
executives of the biggest, most powerful tech companies have decided they're scale-pilled and they're building multi-gigawatt data centers, right, whether it's in Texas or Louisiana or Wisconsin, wherever it is, they're building these massive things that cost millions of dollars. as much as their entire budget for spending on data centers globally in one spot, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is what they've committed to for next year, year after, et cetera. And so they're so convinced that this is the way, that this is what they're doing. But if China decided to, they could do it faster than us. But this is where the restrictions come in. It is not clear that China as a whole has decided, you know, from the highest levels that this is a priority. The U.S. sort of has, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, you see Trump talking about DeepSeek and Stargate within the same week, right? And the Biden admin as well had a lot of discussions about AI and such. It's clear that they think about it. Only just last week did DeepSeek meet the second in command of China, right? Like they have not even met the top, right? They haven't met Xi. Xi hasn't sat down.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they only just released a subsidy of a trillion RMB, you know, roughly $160 billion, which is closer to the spending of like Microsoft and Meta and Google combined, right, for this year. So it's like they're realizing it just now, but... But that's where these export restrictions come in and say, hey, you can't ship the most powerful US chips to China. You can ship a cut down version.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You can't ship the most powerful chips to all these countries who we know are just going to rent it to China. You have to limit the numbers, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And same with manufacturing equipment, tools, all these different aspects. But it all stems from AI and then what downstream can slow them down in AI. And so the entire semiconductor restrictions, you read them, they are very clear. It's about AI and military civil fusion of technology. Right. It's very clear.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then from there it goes, oh, well, we're banning them from buying like lithography tools and etch tools and deposition tools. And, oh, this random like, you know, subsystem from a random company that's like tiny. Right. Like, why are we banning this? Because all of it, the U.S. government has decided is critical to AI systems.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Extreme ultraviolet lithography. To set context on the chips, right, what Nathan's referring to is in 2020, Huawei released their Ascend 910 chip, which was an AI chip, first one on 7 nanometer before Google did, before NVIDIA did. And they submitted it to the MLPerf benchmark, which is sort of a industry standard for machine learning performance benchmark. And it did quite well.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And it was the best chip at the submission, right? This was a huge deal. The Trump admin, of course, banned the Huawei from getting 7 nanometer chips from TSMC. And so then they had to switch to using internal domestically produced chips, which was a multi-year setback.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, he may have woken up last week, right? Leon Feng met the second command guy and they had a meeting. And then the next day they announced the AI subsidies, which are a trillion RMB.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the US government realized in October 7th, 2022, before ChatGPT released, that restriction on October 7th, which dropped and shocked everyone. And it was very clearly aimed at AI. Everyone was like, what the heck are you doing? Stable diffusion was out then, but not ChatGPT. Yeah, but not ChatGPT. So it was like starting to be rumblings. Of what Gen AI can do to society.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But it was very clear, I think, to at least like National Security Council and those sort of folks that this was where the world is headed, this Cold War that's happening. Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is the big risk, right? The further you push China away from having access to cutting edge American and global technologies, the more likely they are to say, well, because I can't access it, I might as well... No one should access it, right? And there's a few interesting aspects of that, right? China has a urban-rural divide like no other.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They have a male-female birth ratio like no other, to the point where if you look in Most of China, it's like the ratio is not that bad. But when you look at single dudes in rural China, it's like a 30 to 1 ratio. And those are disenfranchised dudes, right? Like, quote unquote, like the US has an incel problem like China does too. It's just they're placated in some way or cut, crushed down. What
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
you do with these people? And at the same time, you're not allowed to access the most important technology. At least the US thinks so. China's maybe starting to think this is the most important technology by starting to dump subsidies in it, right? They thought EVs and renewables were the most important technology. They dominate that now, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now they started thinking about semiconductors in the late 2010s and early 2020s, and now they've been dumping money and they're catching up rapidly. And And they're going to do the same with AI, right? Because they're very talented, right? So the question is like, when does this hit a breaking point, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And if China sees this as, hey, they can continue... If not having access and starting a true hot war, right? Taking over Taiwan or trying to subvert its democracy in some way or blockading it hurts the rest of the world far more than it hurts them, this is something they could potentially do, right? And And so is this pushing them towards that? Potentially, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I'm not quite a geopolitical person, but it's obvious that the world regime of peace and trade is super awesome for economics. But at some point it could break, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
raw materials from all over the world. The US would just shut down the Strait of Malacca. At the same time, the US entire... You could argue almost all the GDP growth in America since the 70s has been either population growth or tech. Right. Because, you know, your life today is not that much better than someone from the 80s outside of tech. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You still, you know, you know, cars, they all have semiconductors in them everywhere. Fridges, semiconductors everywhere. There's these funny stories about how Russians were taking apart laundry machines because they had certain like Texas instrument chips that they could then repurpose and put into like their machines. They're anti-missile missile things, right? Like their S-400 or whatever.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You would know more about this, but there's all sorts of like everything about semiconductors is so integral to every part of our lives.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I don't think it's necessarily breaking the reliance. I think it's getting TSMC to build in the US. So taking a step back, TSMC produces most of the world's chips, especially on the foundry side. There's a lot of companies that build their own chips. Samsung, Intel, STMicro, Texas Instruments, analog devices, all these kinds of companies build their own chips and XP.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But more and more of these companies are outsourcing to TSMC and have been for multiple decades.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Sure. So historically, supply chain was companies would build their own chips. It would be a company started. They'd build their own chips. And then they'd design the chip and build the chip and sell it. Over time, this became really difficult because the cost of building a fab continues to compound. every single generation.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Of course, figuring out the technology for it is incredibly difficult regardless, but just the dollars and cents that are required, ignoring, saying, hey, yes, I have all the technical capability, which it's really hard to get that by the way, right? Intel's failing, Samsung's failing, et cetera.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But if you look at just the dollars to spend to build that next generation fab, it keeps growing, right? Sort of like Moore's law is halving the cost of chips every two years. There's a separate law that's sort of like doubling the cost of fabs every handful of years.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so you look at a leading edge fab that is going to be profitable today that's building, you know, three nanometer chips or two nanometer chips in the future. That's going to cost north of 30, 40 billion dollars. Right. And that's just for like a token amount. That's for a like that's like the base building block. You probably need to build multiple. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so when you look at the industry over the last, you know, if I go back 20, 30 years ago. there were 20, 30 companies that could build the most advanced chips, and then they would design them themselves and sell them, right? So companies like AMD would build their own chips. Intel, of course, still builds their own chips. They're very famous for it. IBM would build their own chips.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And you could keep going down the list. All these companies built their own chips. Slowly, they kept falling like flies. And that's because of what TSMC did, right? They created the foundry business model, which is, I'm not going to design any chips. I'm just going to contract manufacturer chips for other people. And one of their early customers is NVIDIA, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
NVIDIA is the only semiconductor company that's doing more than a billion dollars of revenue that was started in the era of Foundry, right? Every other company started before then and at some point had fabs. which is actually incredible, right? You know, like AMD and Intel and Broadcom. Such a great fact.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's like everyone had fabs at some point or, you know, some companies like Broadcom, it was like a merger, amalgamation of various companies that rolled up. But even today, Broadcom has fabs, right? They build iPhone RF radio chips sort of in... Colorado for Apple, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
All these companies had fabs, and for most of the fabs, they threw them away or sold them off or they got rolled into something else. And now everyone relies on TSMC, right? Including Intel, their latest PC chip uses TSMC chips, right? It also uses some Intel chips, but it uses TSMC process.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah. So, I mean, like I mentioned, right, the cost of building a fab is so high. The R&D is so difficult. And when you look at, like, these companies that had their own vertical stack, there was an antiquated process of, like, okay, like, I'm so hyper-customized to each specific chip.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But as we've gone through the history of sort of like the last 50 years of electronics and semiconductors, A, you need more and more specialization, right? Because Moore's law has died. Denard scaling has died, i.e. chips are not getting better just for free, right? You know, from manufacturing, you have to make real architectural innovations, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Google is not just running on Intel CPUs for web serving. They have a YouTube chip, they have TPUs, they have pixel chips, they have a wide diversity of chips that, you know, generate all the economic value of Google, right? Running, you know, it's running all the services and stuff. And so, and this is just Google and you could go across any company in the industry and it's like this, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Cars contain 5,000 chips, you know, 200 different varieties of them, right? All these random things. A Tesla door handle has two chips, right? It's ridiculous. And it's a cool door handle, right? You don't think about it, but it has two really chipped penny chips in there, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Anyway, so as you have more diversity of chips, as you have more specialization required, and the cost of fabs continues to grow, you need someone who is laser focused on building the best process technology and making it as flexible as possible.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, there's more failure points, right? You know, you could have one little process related to like some sort of like chemical etch or some sort of like plasma etch or, you know, some little process that screws up. You didn't engineer it right. And now the whole company falls apart. You can't make chips. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so super, super powerful companies like Intel, they had like the weathering storm to like, hey, they still exist today, even though they really screwed up their manufacturing process. Right. Right. and focusing on specific workloads rather than all of these different things. And so you get more diversity of chips.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You have more companies than ever designing chips, but you have fewer companies than ever manufacturing them, right? And this is where TSMC comes in is they've just been the best, right? They are so good at it, right? They're customer focused. They make it easy for you to fabricate your chips. They take all of that complexity and like kind of try and abstract a lot of it away from you.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They make good money. They don't make insane money, but they make good money. And they're able to aggregate all this demand and continue to build the next fab, the next fab, the next fab.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, so there's aspects of it that I would say yes and aspects that I'd say no, right? TSMC is way ahead because former executive Morris Chang of Texas Instruments wasn't promoted to CEO. And he's like, screw this, I'm going to go make my own chip company, right? And he went to Taiwan and made TSMC, right? And there's a whole lot more story there.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So it could have been Texas Instruments, could have been TSMC, but Texas Instruments. semiconductor manufacturing, right? Instead of, you know, Texas instruments, right? But, you know, so there is that whole story there.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Just the brilliance of Morris Chang, you know, which I wouldn't underplay, but there's also like a different level of like how this works, right? So in Taiwan... you know, like the number, top percent of graduates, of students that go to the best school, which is NTU, the top percent of those all go work to TSMC, right? And guess what their pay is?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Their starting pay is like $80,000, $70,000, right? Which is like, that's like starting pay for like a good graduate in the US, right? Not the top, the top graduates are making hundreds of thousands of dollars at the Googles and the Amazons. And now I guess the open AIs of the world, right? So there is a large dichotomy of like, what is the top 1% of the society doing?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And where are they headed because of economic reasons, right? Intel never paid that crazy good, right? And it didn't make sense to them, right? That's one aspect, right? Where's the best going? Second is the work ethic, right? Like, you know, We like to work. You work a lot. We work a lot. But at the end of the day, what is the time and amount of work that you're doing and what does a fab require?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Fabs are not work-from-home jobs. You go into the fab and grueling work. There's, hey, if there is any amount of vibration, an earthquake happens. vibrates the machines. They're all, you know, they're either broken, you've scrapped some of your production. And then in many cases, they're like not calibrated properly. So when TSMC, when there's an earthquake, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Recently, there's been an earthquake. TSMC doesn't call their employees. They just go to the fab and like, they just show up, the parking lot gets slammed and people just go into the fab and fix it, right? Like it's like ants, right? Like it's like, you know, a hive of ants doesn't get told by the queen what to do,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Which is like some special chemistry plus nanomanufacturing on one line of tools that continues to get iterated. And yeah, it's just like, it's like specific plasma edge for removing silicon dioxide, right? That's all you focus on your whole career. And it's like such a specialized thing. And so it's not like the tasks are transferable. AI today is awesome because like people can pick it up like,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
that. Semiconductor manufacturing is very antiquated and difficult. None of the materials are online for people to read easily and learn. The papers are very dense and it takes a lot of experience to learn. And so it makes the barrier to entry much higher too. So when you talk about, hey, you have all these people that are super specialized, they will work 80 hours a week in a factory, in a fab,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And if anything goes wrong, they'll go show up in the middle of the night because some earthquake. Their wife is like, there was an earthquake. He's like, great, I'm gonna go to the fab. Would you, as an American, do that? These sorts of things are the exemplifying why TSMC is so amazing. Now, can you replicate it in the U.S. ?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Let's not ignore, Intel was the leader in manufacturing for over 20 years. They brought every technology to market first, besides EUV. Strained silicon, high-k metal gates, FinFET, you know, and the list goes on and on and on of technologies that Intel brought to market first, made the most money from, and manufactured at scale first, best technology. highest profit margins, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So we shouldn't ignore that Intel can't do this, right? It's that the culture has broken, right? You've invested in the wrong things. They said no to the iPhone. They had all these different things regarding like, you know, mismanagement of the fabs, mismanagement of designs, this lockup, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
At the same time, all these brilliant people, these 50,000 PhDs or masters that have been working on specific chemical or physical processes or nanomanufacturing processes for decades in Oregon, they're still there. They're still producing amazing work.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's just like getting it to the last mile of production at high yield where you can manufacture dozens and hundreds of different kinds of chips online. you know, and, and it's good. You customer experience has broken, right? You know, it's that customer experience. It's like the, like part of it is like, people will say Intel was too pompous in the 2000s, 2010s, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They just thought they were better than everyone. The tool guys were like, Oh, I don't think that this is mature enough. And they're like, ah, you just don't know. We know, right. This sort of stuff would happen. Um, and so can the U S bring it to the, uh, can the U S bring leading edge semiconductor manufacturing to the U S emphatically? Yes. Right. And we are right. It's happening.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like Arizona is getting better and better as time goes on. TSMC has built roughly 20% of their capacity for 5 nanometer in the US, right? Now, this is nowhere near enough, right? 20% of capacity in the US is like nothing, right? And furthermore, this is still dependent on Taiwan existing, right? There's sort of important way to separate it out. There's R&D and there's high volume manufacturing.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Effectively, there are three places in the world that are doing leading edge R&D. There's Hsinchu, Taiwan, there's Hillsborough, Oregon, and there is Pyongyang, South Korea. These three places are doing the leading edge R&D for the rest of the world's leading edge semiconductors. Now, manufacturing can be distributed more globally. right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And this is sort of where this dichotomy exists of who's actually modifying the process, who's actually developing the next generation one, who's improving them, is Hsinchu, is Hillsborough, is Pyongyang, right? It is not the rest of these fabs like Arizona, right? Arizona is a paperweight. If Hsinchu disappeared off the face of the planet, within a year, couple years,
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Arizona would stop producing too, right? It's actually like pretty critical. One of the things I like to say is if I had like a few missiles, I know exactly where I could cause the most economic damage, right? It's not targeting the White House, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's the R&D centers for TSMC, Intel, Samsung, and then some of the memory guys, Micron and Hynix.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so TSMC, you cannot purchase a vehicle without TSMC chips, right? You cannot purchase a fridge without TSMC chips. I think one of the few things you can purchase, ironically, is a Texas Instruments graphing calculator, right? Because they actually manufacture in Texas. But outside of that, a laptop, a phone. It's depressing. Anything you, servers, right? GPUs, none of this stuff can exist.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And this is without, without TSMC. And in many cases, it's not even like the leading edge, you know, sexy five nanometer chip, three nanometer chip, two nanometer chip. Oftentimes it's just like some stupid power IC that's like converting from like, you know, some voltage to another, right? And it's made at TSMC, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So they do R&D on their own. They're just way behind, right? So I would say in 2015, China had a five-year plan where they defined by 2025 and 2020 certain goals, including 80% domestic production of semiconductors. Mm-hmm. They're not going to hit that, right, to be clear. But they are in certain areas really, really close, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Like BYD is probably going to be the first company in the world to not have to use TSMC for making, because they have their own FAPs, right, for making chips. Now, they still have to buy some chips from foreign, for example, like around like self-driving ADAS capabilities, because those are really high end. But at least like, you know, like an internal combustion engine has 40 chips and an EVE
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
just for controlling flow rates and all these things. And EVs are even more complicated. So all these different power ICs and battery management controllers and all these things, they're insourcing, right? And this is something that China has been doing since 2015. Now, as far as the trailing edge, they're getting so much capacity there. as far as the leading edge, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
IE this five nanometer and so on, so forth, right? Where GPUs, they are still behind. And this is the US restrictions are trying to stop them in the ladder.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But, you know, all that's happened, you know, is yes, they've slowed down their five nanometer, three nanometer, et cetera, but they've accelerated their, hey, 45 nanometer, 90 nanometer power IC or analog IC or, you know, random chip in my keyboard, right? That kind of stuff.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So, so there is an angle of like the U S's actions have been so from these export, you know, from the angle of the expert controls have been so inflammatory at slowing down China's progress on the leading edge that they've turned around and have accelerated their progress elsewhere because they know that this is so important, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
If the U S is going to lock them out here or if they lock us out here as well, uh, in the trailing edge. And so going back, can the U S build it here? Um, Yes, but it's going to take a ton of money. I truly think like to revolutionize and completely in-source semiconductors would take a decade and a trillion dollars.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
TSMC has some like 90,000 employees, right? It's not actually that insane an amount. The Arizona fab has 3,000 from Taiwan. And these people, their wives were like, yeah, we're not going to have kids unless you sign up for the Arizona fab. We go to Arizona and we have our kids there. There's also a Japan fab where the same thing happened, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so these wives drove these dudes to go to Japan or America to have the kids there. It's an element of culture. Yeah, sure. Taiwan works that hard, but also the US has done it in the past. They could do it now, right? We can just import, I say import, the best people in the world if we want to.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And even if you can't import those people, I still think you could do a lot to manufacture most of them in the US if the money's there, right? It's just way more expensive. It's not profitable for a long time.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that's the context of like the CHIPS Act is only like $50 billion relative to some of the renewable initiatives that were passed in the Inflation Reduction Act and the Infrastructure Act, which total in the hundreds of billions of dollars, right? And so like the amount of money that the US is spending on the semiconductor industry is nothing, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Whereas all these other countries have structural advantages in terms of like work ethic and amount of work and things like that, but also a number of STEM graduates, the percentile of their best going to that, right? But they also have differences in terms of like, hey, there's just tax benefits in the law and have been in the law for 20 years, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then some countries have massive subsidies, right? China has something like $200 billion of semiconductor subsidies a year. We're talking about $50 billion in the So the girth or difference in the subsidy amounts is also huge, right? And so I think Trump has been talking about tariffing Taiwan recently.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
That's sort of like one of these things that's like, oh, okay, well, maybe he doesn't want to subsidize the semiconductor industry. Obviously, tariffing Taiwan is going to cost a lot of things to go get much more expensive, but does it change the equation for TSMC building more fabs in the US? That's what he's sort of positing, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
To the same extent, they've also limited US companies from entering China. It's been a long time coming. At some point, there was a convergence. But over at least the last decade, it's been branching further and further out. US companies can't enter China. Chinese companies can't enter the US. The US is saying, hey, China, you can't get access to our technologies in certain areas.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And China's rebuttaling with the same thing around like, you know, they've done some sort of specific materials in, you know, gallium and things like that, that they've tried to limit the U.S. on. There's a U.S. drone company that's not allowed to buy batteries and they have like military customers.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And this drone company just tells the military customers like, hey, just get it from Amazon because I can't actually physically get them, right? Like there's all these things that are happening that point to further and further divergence. I have zero idea. And I would love if we could We could all hold hands and sing Kumbaya, but I have zero idea how that could possibly happen.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's an objective fact that the world has been the most peaceful it has ever been when there are global hegemons, right? Or regional hegemons, right? In historical context, right? The Mediterranean was the most peaceful ever when the Romans were there, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
China had very peaceful and warring times, and the peaceful times were when dynasties had a lockhold over not just themselves, but all their tributaries around them, right? And likewise, the most peaceful time in human history has been when the US was the global hegemon, right? The last, you know, decades. Now, we've sort of seen things start to slide, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
With Russia, Ukraine, with what's going on in the Middle East and, you know, Taiwan risk, all these different things are starting to bubble up, still objectively extremely peaceful. Now, what happens when it's not one global hegemon, but it's two, obviously, and, you know, China will be, you know, competitive or even overtake the US like it's possible, right? And so this change in global hegemony
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I don't think it ever happens like super peacefully, right? When empires fall, right, which is a possible trajectory for America, they don't fall gracefully, right? Like they don't just slide out of irrelevance. Usually there's a lot of shaking. And so, you know, what the US is trying to do is maintain its top position. And what China is trying to do is become the top position, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And obviously there's butting of heads here in the most simple terms.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the U.S. 's current task is like, hey, if we control AI, if we're the leader in AI, and AI significantly accelerates progress, then we can maintain the global hegemony position. I hope that works. And as an American, like, you know, kind of like, okay, I guess that's going to lead to peace for us. Now, obviously, other people around the world get affected negatively.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, obviously, the Chinese people are not going to be in as advantageous of a position if that happens. But, you know, this is sort of the reality of like what's being done and the actions that are being carried out.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, so this goes, and I think we'd have to like, we need to dive really deep into the reasoning aspect and what's going on there. But the H20, you know, the US has gone through multiple iterations of the export controls, right? This H800 was at one point allowed back in 23, but then it got canceled. And by then, you know, DeepSeek had already built their cluster of, they claim 2K.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I think they actually have like many more, like something like 10K of those. And now this H20 is the legally allowed chip, right? NVIDIA shipped a million of these last year to China. For context, it was like four or five million GPUs. So the percentage of GPUs that were this China-specific H20 is quite high, roughly 20%, 25%, 20% or so.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so this H20 has been neutered in one way, but it's actually upgraded in other ways. And you could think of chips along three axes for AI, ignoring software stack and exact architecture, just raw specifications. There's floating point operations, flops. There is memory bandwidth, i.e. in memory capacity, IO, memory. And then there is interconnect, chip-to-chip interconnections.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
All three of these are incredibly important for... making AI systems, right? Because AI systems involve a lot of compute. They involve a lot of moving memory around, whether it be to memory or to other chips, right? And so these three vectors, the US initially had two of these vectors controlled and one of them not controlled, which was flops and interconnect bandwidth were initially controlled.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then they said, no, no, no, no, we're going to remove the interconnect bandwidth and just make it a very simple only flops. But now NVIDIA can now make a chip that has, okay, it's cut down on flops. It's like one third that of the H100 on spec sheet paper performance for flops. In real world, it's closer to like half or maybe even like 60% of it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But then on the other two vectors, it's just as good for interconnect bandwidth. And then for memory bandwidth and memory capacity, the H20 has more memory bandwidth and and more memory capacity than the H100, right? Now, recently, you know, we at our research, we cut NVIDIA's production for H20 for this year down drastically.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They were going to make another 2 million of those this year, but they just canceled all the orders a couple of weeks ago. In our view, that's because we think that they think they're going to get restricted. Because why would they cancel all these orders for H20? Because they shipped a million of them last year.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They had orders in for a couple million this year and just gone for H20, B20, a successor to H20. And now they're all gone. Now, why would they do this? I think it's very clear. The H20 is actually better for certain tasks. And that certain task is reasoning. right? Reasoning is incredibly different than... When you look at the different regimes of models, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Pre-training is all about flops, right? It's all about flops. There's things you do, like mixture of experts that we talked about, to trade off interconnect... Or to trade off other aspects and lower the flops and rely more on interconnect and memory. But at the end of the day, it's flops is everything, right? We talk about models in terms of how many flops they are, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So, like, you know, we talk about, oh, GPT-4 is 2E25, right? 2 to the 25th, you know, 25 zeros, right? Flop, right? Floating point operations. For training. For training, right? And we're talking about the restrictions for the 2E24, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
The US has an executive order that Trump recently unsigned, which was, hey, 1E26, once you hit that number of floating point operations, you must notify the government, And you must share your results with us, right? There's a level of model where the US government must be told, right? And that's 1E26.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so as we move forward, this is an incredibly important... Flop is the vector that the government has cared about historically, but the other two vectors are arguably just as important, right? And especially when we come to this new paradigm, which the world is only just learning about over the last six months, right? Reasoning.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
We're going to get into technical stuff real fast. There's two articles in this one that I could show, maybe graphics that might be interesting for you to pull up.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You want to explain KVCache before we talk about this? I think it's better to... Okay, yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because it's incredibly important because this changes how models work. But I think resetting, right? Why is memory... so important. It's because so far we've talked about parameter counts, right? And mixture of experts, you can change how many active parameters versus total parameters to embed more data, but have less flops.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But more important, you know, another aspect of, you know, what's part of this humongous revolution in the last handful of years is the transformer, right? And the attention mechanism. Attention mechanism is that the model understands the relationships between all the words in its context, right? And that is separate from the parameters themselves. right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And that is something that you must calculate, right? How each token, right, each word in the context length is relatively connected to each other, right? And I think, Nathan, you should explain KVCache better.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I can explain that. So today, if you use a model, like you look at an API, OpenAI charges a certain price per million tokens, right? And that price for input and output tokens is different, right? And the reason is that when you're inputting a query into the model, right? Let's say you have a book, right? That book, you must now calculate the entire KV cache for it, right? This key value cache.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so when you do that, that is a parallel operation. All of the tokens can be processed at one time. And therefore, you can dramatically reduce how much you're spending, right? The flop requirements for generating a token and an input token are identical, right? If I input one token or if I generate one token, it's completely identical. I have to go through the model.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But the difference is that I can do that input, i.e. the pre-fill, i.e. the prompt, simultaneously in a batch nature. And therefore, it is all flopped.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Correct. But then output tokens, the reason why it's so expensive is because I can't do it in parallel, right? It's autoregressive. Every time I generate a token, I must not only read the whole entire model into memory and activate it, calculate it to generate the next token, I also have to read the entire KV cache.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
and I generate a token, and I append that KV, that one token I generated, and it's KV cash, and then I do it again, right? And so therefore, this is a non-parallel operation. And this is one where you have to, you know, in the case of pre-fill or prompt, you pull the whole model in and you calculate 20,000 tokens at once, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
i.e. how many tokens are being generated slash prompt, right? So if I put in a book, that's a million tokens, right? But, you know, if I put in, you know, the sky is blue, then that's like six tokens or whatever.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's mostly output tokens. So before, you know, three months ago, whenever O1 launched, all of the use cases for long context length were like, let me put a ton of documents in and then get an answer out, right? And it's a single, you know, Pre-fill, compute a lot in parallel, and then output a little bit. Now, with reasoning and agents, this is a very different idea, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now, instead, I might only have like, hey, do this task, or I might have all these documents. But at the end of the day, the model is not just like producing a little bit, right? It's producing tons. Tons of information, this chain of thought just continues to go and go and go and go.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so the sequence length is effectively that, you know, if it's generated 10,000 tokens, it's 10,000 sequence length, right? Or, and plus whatever you inputted in the prompt. And so what this chart is showing, and it's a logarithmic chart, right? Is, you know, as you go from 1K to 4K or 4K to 16K, the memory requirements grow so fast
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
64 different users at once, right? Yeah. And therefore your serving costs are lower, right? Because the server costs the same, right? This is eight H100s, roughly $2 an hour per GPU. That's $16 an hour, right? That is like somewhat of a fixed cost. You can do things to make it lower, of course, but like it's like $16 an hour. Now, how many users can you serve? How many tokens can you generate?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And then you divide the two and that's your cost, right? And so with reasoning models, this is where a lot of the complexity comes about and why memory is so important. Because if you have limited amounts of memory, then you can't serve so many users. If you have limited amounts of memory, your serving speeds get lower, right? And so your costs get a lot, lot worse, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Um, because all of a sudden, if I was used to, Hey, on the $16 an hour server, I'm serving Lama four or five B or if I'm serving, you know, deep seek V3, um, and it's all chat style applications, i.e. we're just chatting the sequence sensor thousand few thousand, right? Uh, you know, when you use the language model, it's a few thousand context length.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Most of the time, sometimes you're dropping a big document, but then you process it, you get your answer, you throw it away, right? You, you move on to the next thing, right? Whereas with reasoning, I'm now generating tens of thousands of tokens in sequence, right? And so this memory, this KV cache has to stay resident and you have to keep loading it. You have to keep it in memory constantly.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And now this butts out other users, right? If there's now a reasoning task, right? And the model is capable of reasoning, then all of a sudden that memory pressure means that I can't serve as many users simultaneously.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
To give context, right? Everyone, one of the parts of like freaking this out was like trying to reach the capabilities. The other aspect is they did it so cheap, right? And the so cheap, we kind of talked about on the training side, why it was so cheap.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So I think there's a couple factors here, right? One is that they do have model architecture innovations, right? This MLA, this new attention that they've done is different than the attention from attention is all you need to transform our attention, right? Now, others have already innovated.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's a lot of work like MQA, GQA, local, global, all these different innovations that like try to bend the curve, right? It's still quadratic, but the constant is now smaller, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's 80% to 90% versus the original, but then versus what people are actually doing. It's still an innovation.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Well, and not just that, right? Like other people have implemented techniques like local-global and sliding window and GQMQA. But anyways, like DeepSeq has their attention mechanism as a true architectural innovation. They did tons of experimentation. And this dramatically reduces the memory pressure. It's still there, right? It's still attention. It's still quadratic.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's just dramatically reduced it relative to prior forms.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
So I think this is very important, right? OpenAI is, you know, that drastic gap between DeepSeek and pricing. But DeepSeek is offering the same model because they open-weighted it to everyone else for a very similar, like much lower price than what others are able to serve it for.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so like part of it is OpenAI has a fantastic margin, right? They're serving, when they're doing inference, their gross margins are north of 75%, right? So that's a four to five X factor right there of the cost difference is that OpenAI is just making crazy amounts of money because they're the only one with the capability.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They're losing money, obviously, as a company because they spend so much on training, right? So the inference itself is a very high margin, but it doesn't recoup the cost of everything else they're doing. So yes, they need that money because the revenue and margins pay for continuing to build the next thing, right? Alongside raising more money.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Well, so here's one thing, right? We'll get to this in a second, but like DeepSeek doesn't have any capacity to actually serve the model. They stopped signups. The ability to use it is like non-existent now, right? For most people, because so many people are trying to use it, they just don't have the GPUs to serve it.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
OpenAI has hundreds of thousands of GPUs between them and Microsoft to serve their models. DeepSeq has a factor of much lower. Even if you believe our research, which is 50,000 GPUs, and a portion of those are for research, a portion of those are for the hedge fund, they still have nowhere close to the GPU volumes and capacity to serve the model at scale. So it is cheaper.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
A part of that is OpenAI making a ton of money. Is DeepSeq making money on their API? Unknown. I don't actually think so. And part of that is this chart, right? Look at all the other providers, right? Together AI, Fireworks AI are very high-end companies, right? XMeta, Together AI is TreeDAO and the inventor of like Flash Attention, right? Which is a huge efficiency technique, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They're very efficient, good companies. And I do know those companies make money, right? Not tons of money on inference, but they make money. And so they're serving at like a five to seven X difference in cost, right? And so now when you equate, okay, OpenAI is making tons of money, that's like a five X difference.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And the companies that are trying to make money for this model is like a five X difference. There is still a gap, right? There's still a gap. And that is just DeepSeq being really freaking good, right? The model architecture, MLA, the way they did the MOE, all these things, there is like legitimate just efficiency difference.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I actually don't think they are. I think when you look at the Chinese labs, there's Huawei has a lab, Moonshot AI. There's a couple other labs out there that are really close with the government. And then there's labs like Alibaba and DeepSeek, which are not close with the government.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And we talked about the CEO, this reverent figure who's quite different, who has very different viewpoints based on the Chinese interviews that are translated than what the CCP might necessarily want. Now, to be clear, right, does he have a loss leader because he can fund it through his hedge fund? Yeah, sure. So the hedge fund might be subsidizing it. Yes. I mean, they absolutely did, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because DeepSeek has not raised much money. They're now trying to raise around in China, but they have not raised money historically. It's all just been funded by the hedge fund. And he owns like over half the company, like 50, 60% of the company is owned by him.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They were so far behind and they got so much talent because they just open sourced stuff.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
They released V3 on December 26th. Who releases the day after Christmas? No one looks, right? They released the papers before this, right? The V3 paper and the R1 paper. So people have been looking at it and be like, wow. And then they just released the R1 model. I think they're just shipping as fast as they can. And who cares about Christmas?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Who cares about... Get it out before Chinese New Year, right? Obviously, which just happened. I don't think they actually were timing the market or trying to make the biggest splash possible. I think they're just shipping. I don't know.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Dario explicitly said Claude 3.5 Sonnet was trained like nine months or nine to 10 months ago, nine to 10 months ago. And I think it took them another like handful of months to release it. Right. So it's like there is there is a significant gap here. Right. And especially with reasoning models, the word in the San Francisco street is that like Anthropic has a better model than 03. Right.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they won't release it. Why? Because chains of thought are scary. Right. And they are legitimately scary. Right. If you look at R1, it flips back and forth between Chinese and English. Sometimes it's gibberish. And then the right answer comes out. Right. And like for you and I, it's like great.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
doing this it's amazing I mean you talked about that sort of like chain of thought for that philosophical thing which is not something they trained it to be philosophically good it's just sort of an artifact of the chain of thought training it did but like that's super important in that like Can I inspect your mind and what you're thinking right now? No.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so I don't know if you're lying to my face. And chain of thought models are that way, right? Like this is a true quote unquote risk between, you know, a chat application where, hey, I asked the model to say, you know, bad words or whatever, or how to make anthrax. And it tells me that's unsafe. Sure. But that's something I can get out relatively easily.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
What if I tell the AI to do a task and then it does the task automatically? all of a sudden randomly in a way that I don't want it, right? And now that has like much more task versus like response is very different, right? So the bar for safety is much higher. At least this is Anthropic's case, right? Like for deep seek, they're like ship, right? Yeah.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And they killed that dog, right? And all these things, right? So it's like.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And there's an interesting aspect of just because it's open-weighted or open-sourced doesn't mean it can't be subverted, right? There have been many open-source software bugs that have been like... For example, there was a Linux bug that was found after 10 years, which was clearly a backdoor because somebody was like, why is this taking half a second to load? This is the recent one.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Why is this taking half a second to load? And it was like, oh crap, there's a backdoor here. That's why. And it's like, this is very much possible with AI models. Today, the alignment of these models is very clear. I'm not going to say bad words. I'm not going to teach you how to make Anthrax. I'm not going to talk about Tiananmen Square.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I'm not going to, you know, things like, I'm going to say Taiwan is part of, you know, is, is just an Eastern province, right? Like, you know, all these things are like, depending on who you are, what you align, what, you know, whether, you know, and even like XAI is aligned a certain way, right? You know, there, they might be, it's not aligned in the like woke sense.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's not aligned in like pro China sense, but there is certain things that are imbued within the model. Now, when you release this publicly in an instruct model, that's open weights, um, this can then proliferate, right? But as these systems get more and more capable, what you can embed deep down in the model is not as clear, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so that is like one of the big fears is like if an American model or a Chinese model is the top model, right, you're going to embed things that are unclear. And it can be unintentional too, right? Like British English is dead because American LLMs won, right? And the internet is American and therefore like color is spelled the way Americans spell it, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is just- This is just the factual nature of the LLS now.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It is. Taking it as something silly, right? Like something as silly as the spelling, like which British and English, you know, Brits and Americans will like laugh about probably, right? I don't think we care that much. But like, you know, some people will, but like this can, this can boil down into like very, very important topics. Like, Hey, you know, subverting people, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You know, chatbots, right? Character AI has shown that they can, like, you know, talk to kids or adults. And, like, it will, like, people feel a certain way, right? And that's unintentional alignment. But, like, what happens when there's intentional alignment deep down on the open source standard? It's a backdoor today for, like, Linux, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
right, that we discover, or some encryption system, right? China uses different encryption than NIST defines, the US NIST, because there's clearly, at least they think there's backdoors in it, right? What happens when the models are backdoors not just to computer systems, but to our minds?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Because once it's open weights, it doesn't like phone home. It's more about, like, if it recognizes a certain system, it could... Now, it could be a backdoor in the sense of, like, hey, if you're building a software, you know, something in software, all of a sudden it's a software agent. Oh, program this backdoor that only we know about.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Or it could be, like, subvert the mind to think that, like, XYZ opinion is the correct one.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
There's this very good quote from Sam Altman who, you know, he can be a hype beast sometime, but one of the things he said, and I think I agree, is that superhuman persuasion will happen before superhuman intelligence. Yeah. And if that's the case, then these things before we get this AGI-ASI stuff, we can embed superhuman persuasion towards our ideal or whatever the ideal of the model maker is.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And again, today, I truly don't believe DeepSeek has done this. But it is a sign of what could happen.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Yeah, recommendation systems hack the dopamine induced reward circuit, but the brain is a lot more complicated. And what other sort of circuits, quote unquote, feedback loops in your brain can you hack slash subvert in ways like recommendation systems are purely just trying to do? you know, increased time and ads and et cetera.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But there's so many more goals that can be achieved through these complicated models.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, is that not what character AI has done?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
But like it can be like, yeah, this is a risk, right? Like.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
One of the hilarious things about technology over its history is that the illicit adult entertainment industry has always adopted technologies first. Right. Whether it was like video streaming to like where, you know, there's now the like sort of like independent adult illicit content creators who have their subscription pages.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And there they actually heavily utilize, you know, generative AI has already been like diffusion models and all that is huge there. But now these like these subscription based individual creators do use bots to approximate themselves and chat with their, you know, people pay a lot for it. And people pay a lot, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
A lot of times it's them, but a lot of there are agencies that do this for these creators and do it like on a like mass scale. So the largest creators are like able to talk to hundreds or thousands of like people at a time because of these bots. And so it's already being used there. Obviously, you know, like video streaming and other technologies have gone there first.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
It's going to come to the rest of society, too.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
This is where the whole like hacking models comes from, right? Like GPT will not tell you how to make anthrax, but if you try really, really hard, you can eventually get it to tell you about anthrax because they didn't filter it from the pre-training data set, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
I mean, people have been meaning on like games and other stuff, how to like say things that don't say Tiananmen Square. But, or like, yeah, so there's always like different ways to do it. There's, hey, the internet as a whole does tend to just have a slight left bias, right? Because it's always been richer, more affluent, right?
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
younger people on the internet relative to the rest of the population so there is already inherently a slight left bias right on the internet and so how do you filter things that are this complicated right is it like and and some of these can be like you know factual non-factual but like Tiananmen Square is obviously the example of a factual but it gets a lot harder when you're talking about aligning to a ideal right um
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And so Grok, for example, Elon's tried really hard to make the model not be super PC and woke, but the best way to do pre-training is to throw the whole freaking internet at it and then later figure out. But then at the end of the day, the model at its core now still has some of these ideals.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
You still ingested Reddit slash r slash politics, which is probably the largest political discussion board on the world that's freely available to scrape. And guess what? That's left leaning, right? And so, you know, there are some aspects like that you just can't censor unless you try really, really, really, really, really hard.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
And I mean, it's like you can, you also have the ingested data of like Twitter or like Reddit slash r slash the Donald, which is like also super pro-Trump, right? And then you have like fascist subreddits or like you have communist subreddits. So the model in pre-training ingests everything. It has no worldview.
Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Now, it does have some skew because more of the text is skewed a certain way, which is general, slight left, but also somewhat intellectual. It's just the general internet is a certain way. And then as Nathan's about to describe eloquently, you can elicit certain things out.