Menu
Sign In Pricing Add Podcast

Steeve Morin

Appearances

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

0.089

The thing with Nvidia is that they spend a lot of energy making you care about stuff you shouldn't care about. And they were very successful. Like, who gives a shit about CUDA? OpenAI is amazing, but it's not their compute. Ultimately, if you don't own your compute, you're starting with something at your ankle. In five years, I would say 95% inference, 5% training.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1010.659

You get ripped off. Here's the dirty secret, is that NVIDIA, like a TSMC sells you at 60% margin, NVIDIA sells you at 90% margin. And on top of that, there's Amazon that takes, let's say a 30% margin. So you are a very thin crust on a very big cake. It's a bit of a losing game if you go all in on one provider, you want optionality.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1041.349

Absolutely, yes. Here's the problem, though. Let's say you are on Google Cloud and you're on TPUs. Suddenly, you just removed that 90% chunk on the spend. The problem is that for multiple software reasons, which we are solving at DML, is that they're not really, I would say, a commercial success. They are very much successful inside of Google, but not much outside of Google.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1065.831

Amazon, same, is pushing very, very hard for their, you know, Tranium chips. So the future I see is that you use whatever, you know, your provider has because you don't want to pay, you know, 90% outrageous margin and try to make, you know, a profit out of that. Okay.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1096.151

So these two obey fundamentally different, I would say, tectonic forces. So in training, more is better. You want more of everything, essentially. And the recipe for success is the speed of iteration. You change stuff, you see how it works, and you do it again. Hopefully it converges. And it's like changing the wheel of a moving car, so to speak. So that is training.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1120.158

On inference, this is a complete reverse. Less is better. You want less headaches. You don't want to be working up at night because inference is production. You could say that training is research and inference is production. And it's fundamentally different. In terms of infra, probably the number one thing that is the number one difference between these two is the need for interconnect.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1141.656

So if you do production, if you can avoid to have interconnect between, let's say, a cluster of GPUs, of course you will avoid that if you can. And this is why models have the sizes they have. It's so that people can run them without the need to connect multiple machines together. It's very constraining in terms of the environment.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1166.568

So that is probably the fundamental difference, the need for interconnect. And number two is, ultimately, do you really care about what your model is running on as long as it's outputting whatever you want it to output?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1189.252

Think of it like doing a painting and doing a million paintings. The tools you will use, the process you will do. If you do one painting, what you favor is the speed at which you can do a stroke and do some iteration. If you do a million, what you want is a process that is reliable, that can deliver you efficiently a million paintings. So that is the same for training versus inference.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1214.424

If you run around millions of instances of a model, You cannot hack your way to do that. By the way, people do hack their way today, but this is probably the fundamental difference.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1237.494

There's a lot of duct tape. Here's also probably one of the problem is that training on first principle is actually two passes, forward and backward, right? It's called forward pass and backward pass, right? Inference is running only the forward pass. So that's how things are today. There are people who are trying to specialize a bit because at some point duct tape doesn't really work out.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1262.133

And when you're on big scales, that makes a problem. And it's a problem that's growing because a lot of people are coming on the market with needs for inference. That wasn't the case, you know, a year and a half ago or a year ago. OpenAI had this problem, right? Maybe Anthropic had this problem. But it wasn't a universal problem yet. And now it's becoming a universal problem.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1286.872

So, for instance, probably the number one thing, depending on how you deploy, but if you deploy inference, the number one thing that will get you is what's called autoscaling. So as your systems get more and more loaded, you want to provision because these things are tremendously expensive. You want to provision them as you scale, right? So you don't want to say, I have 1,000 GPUs, 24 hours.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1312.969

Even if there's nobody on the production, I will pay for them, which is, mind you, what people are doing today. This is crazy. So what you want to do is you want to provision, compute as you grow your needs, right? and you want to do it up, and you want to do it down. Probably the number one thing that gives you a lot of efficiency in terms of spend.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1334.8

We're talking multiples, like 5x, sometimes 10x improvement. The thing is, this is a problem, at least in I would say regular back-end engineering, this is a problem everybody knows. Everybody is doing it because the savings are so huge. But on AI, nobody really had the problem. So now they're coming up to it. So this is one example.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1367.62

That's one example, yeah. Another one is choosing the right compute. It's like kind of, I would say, a vicious circle because provisioning compute is very hard. So if you lose compute, it's very bad.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1380.332

You are essentially incentivized to overbuy, in the case of Amazon or Google that would be buying reserved compute, which you're not going to use because if you buy it on demand, you will get tremendously ripped off. So that creates this face scarcity of compute that people buy preemptively because they raise a shit ton of money and they're not using it. So this is a major problem too.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1410.992

It might well be, yes. We are being spared a bit because Blackwell is late and others are getting canceled. And so H series, I would say are still, you know, in the active, but yes, absolutely. But you know, what choice do you have? This is the thing.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1445.107

So I might tell you that I think it already started. I'm getting cold emails for discounts from services I never heard about. And I started getting these emails probably around October, November. Some people are left with a lot of capex that they don't know what to do with.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1463.273

It's a different thing to build a cluster and do a training run than it is to build literally a cloud provider or hyperscaler or whatever you want to call it. There are a lot of people who do their training runs on the regular providers, but then move to regular hyperscalers when they do production. So I'm very much worried there will be an oversupply of these chips.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1488.206

The problem is that, remember, the chips are the collateral. So somewhere in the US or whatever, there's going to be a data center with like a thousand GPUs that people may buy 30 cents on the dollar. This is what might happen.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1532.378

Technically speaking, he is right. But realistically speaking, I'm not sure I agree. The thing is, these chips are on the market. They're here. I'll tab on Chrome and get one. That is something that I don't take lightly. Availability, that is, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1549.803

I think Nvidia is used to stay, at least if not for the H100 bubble bust, because these chips are going to be on the market and people will buy them and do inference with them. Remains to see the OPEX and the electricity, etc., but... The thing is, the only chips that are really frontier on that sense are probably TPUs and then the upcoming chips.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1575.482

But the thing is, they're great chips, but they're not on the market. Or like there are outrageous prices, like millions of dollars to run a model. So what chips are great and why aren't they on the market? Let's say for instance, Cerebras, incredible technology, incredibly expensive. So how will the market value the premium of having single stream, very high tokens per second?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1597.642

There is a value into that, right? As we saw with Mistral and Perplexity, but I think there was another loss. I don't know, I don't have the details, But I think it was done at a loss that Cerebrus put it out. So today there's three actors on the market that can deliver this. I think this will be, I would say, the pushing force for change in the inference landscape, agents and reasoning.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1634.871

So there's this trick. Because here's the thing, there's no magic. This little trick is called SRAM. SRAM is memory on the chip directly. So that is very, very fast memory. But here's the problem with SRAM. is that SRAM consumes surface on the chip, which makes it a bigger chip, which is very hard in terms of yield, right? Because the chances of problems are higher and so on.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1660.805

So SRAM is, I would say, very, very, very fast memory, which gives you a lot of advantage when you do very, very high inference, but it's terribly expensive. And if you look at, for instance, Grok, they have on their generation, this generation, they have 230 megabytes of SRAM per chip. A 7TB model is 140 gigabytes. So you do the math, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1685.7

Cerebras has 44 gigabytes of SRAM into what they call their wafer scale engine. which is a chip the size of a wafer. I mean, most likely it's interconnected, but it's huge, right? And it has to be water-cooled. They have copper, you know, I would say needles that touch the chip. It's crazy stuff. Very, very impressive technology, mind you, but very, very expensive. So my bet is...

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1710.692

I think there will be chips on the market that do that at a much lower price. And there's two companies I see going in that direction. One is called Etched, and the other one is called Vsor. That's the two I see. Because if you can deliver this at, I would say, the price that is comparable to GPUs, you've won.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1735.951

It's hard to say. I mean, you need some SRAM, but if you can have a smaller process node, but if you can hook yourself with external memory, then yes, you can do that a lot better. But the thing is, if you go full-blown SRAM, then there's no magic. You will have to pay the price.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1765.333

pushed by reasoning. So reasoning, not in the sense that you see on DeepSeq and whatever, right? Reasoning and what's called latent space reasoning. Latent space reasoning and agents will push the market towards different types of compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1783.404

So the way models reason today is their reason in tokens. So it's as if you think to yourself, you would say out loud what you're thinking. So yes, it works, but it's a bit inefficient, right? And you lose information doing this. Latent space reasoning is this without going, I would say, to English or whatever, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1806.384

So staying in what's called the latent space, which is where all the information of an LLM, let's say an LLM, An LLM lives, right? So this is very much how we work as humans. And we move toward what Yann LeCun calls an energy-based model in which we have different types of longer or shorter, I would say, thinking times, if you will, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1829.581

So that fundamentally, GPUs cannot deliver this, plain and simple, at scale. Why can't GPUs deliver it? Because the access to external memory prevents it. So HBM is all the rage, right? But HBM compared to SRAM is absolutely, you know, dark slow. So this is the problem you get. So HBM is like the best we can do, but it's still slow versus SRAM.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1871.539

No, you want HBM, to be clear. No, SRAM, this will not deliver. It's a dead end in terms of scaling SRAM means scaling the surface, means you get depreciating problems. It explodes everywhere, right? So you need some SRAM, right? So we'll have bigger amounts of SRAM into chips. And of course, bigger what's called external memory into chips. The issue with HBM is that it's still slow.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

19.994

You have the products, the data, and the compute. Who has all three? Google has, like, Android, Google Docs. They have everything. They can sprinkle everywhere. This is the sleeping giant in my mind.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1900.476

And yes, maybe NVIDIA has a stronghold and they can prevent you from getting some. So that would be like, I call it the Nutella situation in which, you know, Nutella, they own 80% of the hazelnuts market, right? So yes, you can do a competitor, but who will you buy the nuts from, right? So there will be a need for HBM. There will be a need for SRAM, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1920.549

I would say better, more dedicated architecture will be able to deliver these things. And then there's like the next frontier after that, which is called compute in memory. There's two companies that are on that market. One is called Rain, Rain.ai. Sam Altman is one of the investors. There's no surprise. The other one is called Fractile. So this is the next frontier.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1941.716

And the idea is that instead of like transferring the data between external memory and the CPU and do the compute there, You actually bring the CPU to the memory and you do everything. It's crazy stuff, but it's coming. Maybe not this year, but... How does that change the situation? It makes it much more efficient, but what does that actually mean in reality?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1964.508

It means you get maybe not SRAM-level performance, but you get a lot faster performance. in terms of compute. And if you translate that to LLMs, let's say, you get much, much higher tokens per second in a single stream, which is exactly what you want when you go into reasoning. You want your model to maybe think, let's say, for like half a second, and then boom.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

1987.279

You don't want to wait 50 seconds and context switch to some other thing, which is the problem everybody has today, mind you. So yeah, I think inference will be pushed. The compute landscape will be pushed to change because of these two constraints.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2022.017

Depends on the supply. I think that there's a shot that they don't. Because here's the thing, you know, even if we take, you know, same amount of, you know, let's imagine we have a new chip from Amazon, right? That is the same amount. Oh, wait, we do. It's called Tranium. You know, why would I pay 90% margin of NVIDIA if I can freely change to Tranium? My old production runs on AWS anyways, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2047.954

If you run on the cloud and you're running on NVIDIA, you're getting squeezed out of your money, right? So if you're on production on dedicated chips, of course, so maybe through commoditization, but hey, I'm on AWS, I can just click and boom, it runs on AWS's chips. Who cares, right? I just run my model like I did two minutes ago.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2079.564

They are. They have a protocol NIM, sort of does that. The thing with NVIDIA is that they spend a lot of energy making you care about stuff you shouldn't care about. And they were very successful. Like, who gives a shit about CUDA? I'm sorry, but I don't want to care about that, right? I want to do my stuff.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2096.403

And NVIDIA got me into saying, hey, you should care about this because there's nothing else on the market. Well, that's not true. But ultimately, this is the GPU I have in my machine. So, you know, off I go. If tomorrow that changes, why would I pay 90% margin on my compute? That's insane. This is why I believe it ultimately goes through the software. This is my entry point to the ecosystem.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2121.342

If the software abstracts away those idiosyncrasies, as they do on CPUs, then the providers will compete on specs and not on fake modes or circumstantial modes, right? So this is where I think, you know, the market is going. And of course, there's the availability problem. There is, you know, if you, you know, piss off Jensen, you might need to kiss the ring, you know, to get back in line, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2177.193

So all, I would say, chip makers have a GTM problem. All of them. whether it's Google, whether it's AMD, whether it's TenStorm. The problem is that there's, I would say, probably two fundamental problems. The number one is if you're maintaining multiple stacks today is very, very, very hard. So you don't. So let's say I buy AMD. I want to buy AMD, right? That means I'm going to abandon NVIDIA.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2204.471

Oh, crap. I have a six-year amortization plan on that. Oh, man, what do I do? So do I need to support both stacks? Uh, unclear. Maybe until AMD tells me, hey, you know, you have, I don't know, let's say 1,000 Nvidia GPUs. You're about to buy 100,000 of AMD. I mean, come on, right? And I'm like, okay, that is, you know, makes it worth my while, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2227.305

But that is ultimately the fundamental problem is that the steps are very high, right? I need to have a lot of incentives to buy into that ecosystem. So I need to buy a lot of them. So if you're AMD, that is already a problem. But then Microsoft comes along and buys it all, makes, by the way, OpenAI, or at least on the inference side, puts OpenAI in the green because of the efficiency gains.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2263.527

It's actually both. Yeah. The buy-in is very high. So to make it worth it, you have to buy a lot. And if you buy a lot, this is, you know what, we talk to all of them. They always have the same questions and it's completely understandable. They say, this is great, but who's the customer? Because on the other side, let's take Amazon, for instance, with Tranium.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2285.063

Apple just came and said, hey, we're going to buy 100,000 of them. So you want to buy 10,000, you feel like the big shot, right? Yeah, but go back to the queue because there's Apple before you, right? So they have to have very high commitments. You cannot be incrementally better. It's very hard, right? And also very hard, I can give you one metric if you want.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2306.168

I know for a fact that being seven times better and take whatever metric you want. Whether it's spend, whether it's whatever, it's not enough to get people to switch. People will choose nothing over something. So this is a very hard market to enter into because you cannot also compete of incremental gains. It's very hard, right? So you have to convince a lot of people.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2329.761

Maybe you can go the Middle East route in which they sprinkle everything and they evaluate everything. That's not, you know, very sustainable, I would say, strategy in the long term.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2351.212

Absolutely. The right approach to me is making the buy-in zero. If the buy-in is zero, you don't worry about this. You just buy whatever is best today. How do you do that by renting? Oh, because this is what we do. This is our promise. Our thesis is that if the buy-in is zero, you know, you completely unlock that value.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2374.63

It means that you can freely switch, you know, compute to compute, like freely, right? You just say, hey, now it's AMD, boom, it runs. You just say, oh, it's 10 store and boom, it runs, right? How do you do that then?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2390.138

Oh, yeah, yeah, yeah. Not agreements, but we work with them to support their chips. But the thing is, at least as a user myself of our tech, is that if it's free for me to switch or to choose whichever provider I want in terms of compute, AMD, Nvidia, whatever, then I can take whatever is best today, and I can take whatever is best tomorrow, and I can run both.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2414.708

I can run three different platforms at the same time. I don't care. I only run what is good at the moment. And that unlocks, to me, a very cool thing, which is incremental improvement. If you are 30% better, I'll switch to you.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2438.879

This is actually a great question. I think that if you are doing it bottom-up, infra to applications, you will lose because nobody will care, as they don't today, right? If you look at TPUs, they're available, they're great. Nobody cares. Why does nobody care about TPUs, sorry? Because the cost of buying, it's always the same, right? You have to spend six months of engineering to switch to TPUs.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2462.72

And mind you, TPUs do training. They are the only ones. We're training them now. But AMD can do training, but in terms of maturity, by far the most mature software and compute is TPUs, and then it's NVIDIA, right? So the buy-in is so high that people are like, no, fuck. We'll see, right? I'm not on Google Cloud. I have to sign up. Oh my God, right? So these are tremendous chips.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2488.356

These are tremendous assets. Now, in terms of the risk, I think if you want to do it, you have to do it top to bottom. You have to start with whatever it is you're going to build and then permeate downwards into the infrastructure. Take, for example, Microsoft with OpenAI. They just bought all of AMD's supply and they run ChatGPT on it. That's it. And that puts them in the green.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2511.871

That's actually what makes them profitable on inference. Or at least, let's say, not lose money.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2523.715

Because I can give you actual numbers. If you run eight H100, you can put two 70B models on them because of the RAM. That's number one. Number two is if you go from one GPU to two, you don't get twice the performance. Maybe you get 10% better performance. Yeah, that's the dirty secret nobody talks about. I'm talking inference, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2546.434

So you go from, let's say, 100 to 110 by doubling the amount of GPUs. That is insane. So you'd rather have two by one than one by two, right? So with one machine of H100, you can run two 7 TBs model if you do four GPUs and four GPUs, right? That's number one. If you run on AMD, well, there's enough memory inside the GPU to run one model per card. So you get eight GPUs, eight times the throughput.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2575.051

While on the other hand, you get eight GPUs, two, maybe two and a half times the throughput. So that is a Forex right there, just by virtue of this. So that is the compute part. But if you look at all of these things, there are tremendous amount of, we talked to companies who have chips upcoming with almost 300 gigabytes of memory. on it, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2598.706

So that is, you know, a model like one chip per model. This is the best thing you want if you're on seven TBs, right? So, which is what I would say, not the state of the art, but this is the regular stuff people will use for serving.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2613.698

So if you look, you know, top to bottom and you know what you're going to build with them, then it's a lot better to do the efficiency gains because four times is a big deal, right? And mind you, these chips are 30% cheaper than Nvidia's. It's like a no brainer. But if you go bottom up and say, I'm going to rent them out, people will not rent them. Simple.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2633.946

So that's why, you know, I think it's a good way to attack it from the software because ultimately, do you really care about that your MacBook, let's say, is an M2 or an M3? It's like, oh, it's the better one. And that's it, right? And imagine if you had to care about these things. That would be insane.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2667.346

Stock? Yeah. I used to think the market was efficient. So probably I would go today, at least I would go with Nvidia still. Because the supply. But, you know, if we play our cards right, we ship our stuff, hopefully I will come back and tell you to buy AMD as much as you can. Or a 10-storrent, you know, if they go public or whoever else. These chips are amazing, by the way.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2698.339

Probably not a lot of people are accustomed to what it entails to run production. So that inference is production, and production is hard. Somebody has to wake up at night. And I used to be that guy, right? I don't want to do it again. So production is hard.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2715.746

Thankfully, we have a lot of software nowadays to do that a lot better, but there's not a lot of reuse because the AI field, at least, is not really accustomed to that yet. It's changing, but the discussions I had a year ago and the discussions I had today are not the same. They're going to the right direction, but they're not there exactly yet. So probably that would be the number one thing.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2761.545

I mean, they're still going after training. So there's still this frontier. Probably it's why also NVIDIA is the better buy right now. Because on the NVIDIA side, if you do training, it's incremental. If you have bought 1,000 NVIDIA GPUs and you buy 1,000 new NVIDIA GPUs, that gives you 2,000 GPUs, right? But if you buy 1,000 and 1,000 AMD, that gives you twice 1,000, right? It's a bit different.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2786.24

So they're still going after training, definitely. And they're very pragmatic in doing so. But, I mean, they have the capex to spend. They're not making their money out of it, probably. The only one, by the way, that owns their compute are Google. There's like this triangle of, I would say, of wind that I, this is my mental model, mind you. You have the products, the data, and the compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2807.807

Who has all three? And you get everything flows from there. Products, data, compute. Who has all three? Google? Amazon? Amazon, they don't have products. They have Amazon, right? They have AWS, but they don't have actual products. Google has like, you know, Android, Google Docs, whatever. They have everything. They can sprinkle everywhere. This is the sleeping giant in my mind.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

281.573

So at the very bottom of things, ZML is an ML framework that runs any models on any hardware. We sit ultimately at the infrastructure layer. We enable anybody to run their model better, faster, more reliably, but on any compute whatsoever. Doesn't really matter. It could be NVIDIA. It can be AMD. It could be TPU. and whatnot. And we do all that without compromise.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2843.545

I mean, open AI is amazing, but it's not their compute. It is Microsoft's compute. And if you own your compute, you own your margin is essentially what you're saying. Yeah. Even Microsoft, when they were running NVIDIA, they bought NVIDIA at some outrageous margins. I talked to a lot of people that build data centers, and I tell them, mind you, these people buy tens of thousands of GPUs.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2868.453

And I asked them, hey, do you get at least a discount or something? And they're like, no, the only thing we get is the supply. So, I mean, ultimately, if you don't own your compute, you're starting with, you know, something at your ankle. Definitely. And so this is why I like to think in this triangle, product, data, compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2912.296

There's like a brute force approach to this. It is a very American approach, more and more and more. But the thing is, you look at, for instance, the XAI cluster. It's not 100,000 GPUs. It is four times 25,000. You're starting to see something because InfiniBand and in the case of Rocky, which is anyways, the technology they used to bridge their GPUs together. You have upper bounds, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2937.833

At some point you're fighting physics. So you can push, it's like, you know, trying to get to the speed of light. As you approach it, the amount of energy you need is a lot higher and a lot higher and it grows and grows.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2948.543

So there's two, I would say, counter to that would be that number one is we're still scaled, but there's a lot of waste and excess, you know, spending on the engineering side, which is the deep seek approach, right? Very successful at that, mind you. They said, yeah, if we do this and this differently, then we get, you know, multiple sometimes, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2968.777

So virtually you increase your compute capacity because you're more efficient. And the other approach is Yann LeCun's approach, which is this is not scaling. And at some point you need, we need to look the problem in the face and do something better, right? So of course we push and push and push because there's capital still, but I'm more of these two approaches. I think you can do more with less.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

2998.706

I think until somebody does it. DeepSeq was a good wake-up call, right? Suddenly efficiency is in. That's number one. And number two is until there's a new architecture that comes out and changes the game. So in the case of LLMs, for instance, you have these what's called non-transformer models that changes fundamentally the compute requirements.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3019.199

That might be a frontier that completely obsoletes the transformers. The transformers are the building block by which current models work. The way they work is that for each token or syllable, the model will look at everything behind it. You can see that as you add more text, you have more work to do.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3040.8

So there are these new architectures that do not require this, that might change these things and probably shift the amount of compute needed to do training or to do inference. And then there's the new thing, which is Yann's thesis, which is the word model. As in, LLMs are at the end. What we need is something that understands the world fundamentally. And this is, it's JEPA thesis, it's called.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3065.48

I'm very bullish on this, but it's very frontier. Why are you bullish on it? And why is it so frontier? Because it's Yann LeCun. It's hard to. He's no bullshit, right? So he explained to me how it worked and I was blown away. But it makes a lot of sense. We are creeped out because the machine talks back to us. But it's not a new thing, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

307.784

That's the key point, because if there's a compromise, then it's not really, you know, agnostic, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3087.33

It used to, you know, this is not new technology when it came out. Well, like when it exploded, it was a new technology. But suddenly it was talking back and that freaked us out. And we got crazy on it, right? But language is one form of communication, but it is ultimately a very narrow window into, you know, the world. We use it to describe the world arguably with some loss, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3110.289

And so the JPA approach is, long story short, is that... you have essentially two things you want to do, and you try and minimize the energy to do them. And from this, understanding emerges, physics emerges, et cetera, because you're trying to minimize the amount of energy to go from one state to the other. And that actually makes sense. Like if you try and pick this AirPod case,

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3133.385

I'm not going to go round trip around the block to get it, right? I just get it. And in my brain, it's wired to just do the thing. If I go and, you know, talk to myself out loud, put the hand down, move to the left and whatever, that feels very, you know, inefficient. So probably this will be something that changes. And in the case of LLMs, there's good work also.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3157.253

on what's called diffusion-based LLMs, which means like instead of thinking, you know, what's called auto-aggressively, that means you get a new token, you re-inject and you redo, etc. They think more like what we do, which is in patches, right? Imagine a paragraph of text and words appear until it's done.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3189.708

I think it's fair game, to be honest. I will not shed a tear. It's fair game. If you, there were like some people who tried to ask, I think it was, I don't remember if it was an open AI model. So a diffusion model image, right? They asked it to generate an image from a Star Wars movie at whatever timestamp. And it came out with the Star Wars movie, you know, screenshot.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3210.485

Obviously it was trained with it. I think it's fair game because there's no free lunch, right? It was trained with data. You had a good ride. Somebody was sneaky and took it, but you took it from the beginning too. So let's just accept it's fair game.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3230.66

Absolutely, absolutely. I take my cup and enjoy it very much, that movie, every single day.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3255.327

I'm a bit split on this. There's a part of me that said that if you re-inject data into the system, the system deteriorates. That feels a bit, I would say, intuitive. But if you look at AlphaGo, for instance, the moment it's ramped up in its skills is when they started generating games. or synthetic games, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3274.116

So I'm a bit, you know, split, but there are some verticals that very much benefit from this. Code LLMs, for instance. We can run code, right? So this is the poolside thesis. Just so I understand, why does it work for coding and not for other things? Because you don't use the AI model to generate output. You use the machine. You just run the code, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

328.778

Yes, you actually can see it. It's been happening for a while. Models now are not the right abstractions, at least if you look at closed source models, they're not really models. They're more like backends. And there are a lot of tricks that you feel like you're talking to one model, but ultimately you're talking to a constellation, an assembly of backends that produces a response.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3296.669

And you see what it makes and you run all this code and you create data out of it. Whereas if you run an LLM and you say to an LLM, all right, generate me two trillion tokens of text, it will do it with its, you know, so you may inject and stuff. So there's a lot of tricks, but ultimately my guts tell me that it feels wrong, right? Because you re-inject. data that was there.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3320.805

And so it will deteriorate. There's loss. So yeah, I'm a bit bullish. I'm not sure exactly on what vertical. Code is one. We'll see. Distillation is, in some sense, a bit like that. You create synthetic data from a bigger model into a smaller one. Probably the most, I would say, mind-blowing thing about distillation is that sometimes the smaller models become better than the bigger model.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3352.721

One theory is that the smarter model is better at generating output that you would want it to generate, essentially. It's not better in the general sense. It's better at the task at which you were measuring it. This is what it learned to imitate.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3377.49

Sometimes it's wasteful to run big models. A lot of times it's actually wasteful to run big models. I think there's going to be a lot of smaller models for efficiency reasons, but...

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3388.737

There's a but, which is you talk to people at DeepMind and they don't even fine tune anymore because they have such, you know, what's called big context window, which is what the model, you know, the data, the model you inject, right? At one time that nowadays they just dump data into it and just say, do whatever, you know, that data tells you to do instead of fine tuning as we used to do.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3412.228

So if the efficiency gains, we're not there yet, right? But if their efficiency gains, I would say pass that threshold, we'll just do it at runtime. We'll just have a great model that will just specialize at each request. But that's not for tomorrow, I think.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3432.225

It's a very, very clever trick. What you do is you represent knowledge into what's called the vector space or latent space. And what you do is through what's called vector search. So imagine you have, let's say, a 3D space that represents all knowledge, all of everything. And let's say a cat sits here, a dog sits close because it's an animal, but it's far from some other property and so on.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3460.72

So what you do is you run the user's request through this same system. It's called an embedding. And that will give you a vector, and you will take whatever is closer to you, what's called semantically close. And then it's actually very clever. You actually insert those pieces of text before the request.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3482.608

So it's as if you would say, knowing the following, and you give the data, let's say it's law or whatever, please answer my request. And that's it. So that's a bit of a clever trick. It's a bit dirty because, of course, you know, you are limited by the amount of data you can input, right? So there's this problem in which how do you chunk, you know, the data that you input?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

349.64

Probably the number one, you know, I would say obvious thing would be that if you ask a model to generate an image, then it will, you know, switch to a diffusion model, right? Not an LLM. And there's many, many more tricks. The turbo models and OpenAI do that. There's a lot of tricks.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3517.237

It is, it is. Depends on how it works, but yes, sometimes it is. But think of it as in, it's like a preamble to your question. Knowing the following, and the following is a tiny window into the content. Please answer my question. And of course, as you talk more and more, it will forget because that window is fixed.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3544.165

What pushes smaller models are efficiency, roughly, speed. You know, less is better. So if we can do with less, then less it is. Simple as this, right? In terms of RAG, the key frontier is what we call attention level search. But this is something we're working on. You have the exclusivity now, I'm putting it out there. It doesn't push, I would say, model sizes.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3566.182

What really pushes model sizes are the efficiency rather than specializing. Meaning that if you can do the same performance with a smaller model that is fine-tuned with RAG or whatever, then you'll do it with a smaller because, again, less is better.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3596.297

Oh, I love it. Constraint is the mother of innovation. Yes, you know, we can, you know, troll a bit about, you know, the Singapore, you know, gray market and all of these things. But ultimately, like they had no choice. Here's the thing. If you can buy more, why would you give a damn, right? You can just buy more. So if you are pushed to efficiency, then you will deliver efficiency.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3618.994

These are very, very skilled people. This is the coolest thing to me about AI, honestly, is the geography doesn't matter anymore. You can just do things. You appear out of nowhere, boom, you're on the map. And so I'm very, very glad that they did. I found the reaction very entertaining, to be honest. So yeah, I mean, constrain is a very good driver of efficiency.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

365.108

So definitely models in the sense of getting, you know, weights and running them is something that is ultimately going away because, you know, in favor of like full-blown backends, right? You feel like you're talking to a model, but ultimately you're talking to an API. The thing is, that API will be running locally in your own cloud instances and so on.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3652.102

I'm not sure who is a threat to OpenAI at the moment. Here's why. You look at the numbers. I mean, we live in a bubble. We, you know, we follow every new episode, the whatever new model, whatever, who said what and so on. But, you know, I go to my mother and I ask her, you know, do you know ChatGPT? And she says, yes.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3669.985

And do you know, I don't know, I don't want to dunk on anybody, but do you want to know some other model? And she says, what it is, what is it, right? Even Gemini, right? Google, right? So they have a strong brand, they have a strong product, but there's a balance between the product and the models, honestly.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3685.615

So this is Gary from Fluidstack, actually, who told me that his mentor model in terms of model providers, they'll be like car makers. There's no winner tickle. Everybody will have their own because ultimately also human knowledge is everybody has everything. So we're converging. But I liked an analogy.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3703.789

Yes, deep seek made a very good, you know, made waves, but it was, it was, you know, waves that were amplified by the media and the narrative and the drama.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3719.036

Today, maybe. Tomorrow, I'm not sure. They're a bit late in terms of ASIC. They are like A100 level, but they have probably, I would say, one of their unfair advantage is that it's like when you do exercise in the water, right? It's like this. So this is their state. They are constrained, so they are bound to do better. They can just not buy their way into better compute.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3757.942

No, I don't care. This is something that makes me wonder sometimes. I understand the narrative and so on, but I am absolutely not fearful. Let's be successful first, and then we'll talk about the politics. But again, I'm not Mistral. I'm not building gigawatt data centers and so on. So if you build gigawatt data centers, you run into these problems. But maybe you run into these problems. But

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3798.981

They are very competent. I think it's easy to spread FUD. There's a lot of FUD going around, especially about regulation and everything. But here's the thing, I look around me and I don't see, you know, what I read, right? So I am hardly convinced about, you know, everybody was saying that they were dead and boom, they came out with their release and it was insane.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3819.509

So what I know is that I hope they don't have too much money. That's for sure. You want to be clever, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3836.181

My first impression was that I don't buy it. I would say, you know, American style, right? You start with the claim and we'll figure it out later. I don't buy it. And ultimately, I'm not sure I care that much about it. Let's imagine it's true, right? Congratulations. Amazing. But it is more of the same. It is a vertical scaling. And as you know, my days are spent on efficiency.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3859.756

So I look at these things as being like, all right, this is a bigger, you know, this is an American car of AI. It's big. It consumes a lot of gas. But ultimately, you know, it's not a good car, right? I think there has to be sufficient capital, but at some point, I'm not sure it is really a differentiator. That was prior to DeepSeek, then DeepSeek came.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3881.67

That was always my thesis, but you need money, you need infrastructure, you need... But what is ultimately probably the two limiting factor today is talent and energy. That's it. The rest, yes, of course, you can buy 500 billion of GPUs. By the way, 90% margin. So if we work on that margin, we can shrink that number probably. So I'm not easily entertained by these numbers.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3911.32

I've seen how the sausage is made way too many times. Dude, I want to do a quick fire with you.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3925.774

Oh, yeah. Latency, reasoning. Definitely. This year. What does that mean in... So the shift from throughput, so how speed my answers, to how long it takes for my answer complete to appear. That is probably one of the fundamental, like this year, right? Longer term, I'm very rooting for non-transformer models that will change the compute, also landscape.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3961.445

Probably the number one thing I would say is do not resell compute if you can. A lot of AI startups that are building on top of AI are trying to make a margin on top of a very big cake. And ultimately what they sell is compute. If you look at the dollar of spend, for $1 of spend, maybe 98% of it goes to somebody else's margin.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

3983.872

So if you do AI, as much as you can, try to verticalize on the product, but not on the compute. If your business model implies buying a lot of tokens, it's a very hard circle to square to put that into $20 a month. So I always say, please look at it from that angle. And if you can, try and avoid it. What's the biggest challenge that Jensen Huang faces today?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

4011.914

The highs are very high, but they don't last forever. So probably it's how to navigate the downslope. Blackwell is probably something that keeps him awake at night.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

4029.9

Because orders are getting canceled. Why are they getting canceled? They have a lot of problems with these chips. So a lot of people, you know, are canceling their orders. These chips are like on the frontier of scaling. And so, you know, they were supposed to come out last summer, but that heat dissipation and, you know, matter bending problem used to be called

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

405.334

No, absolutely. You can get probably an order of magnitude more efficiency depending on the hardware you run on. That is substantial. Not a lot of people have that problem at the moment. Things are getting built as we speak. But a simple example is if you switch from NVIDIA to AMD on a 7 TB model, you can get four times better efficiency in terms of spend. So that is substantial.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

4052.154

The people who are very privy to silicon told me, this is what we call a pretty big fucking problem, right? End quote. Probably how to navigate the downslope. Maybe you don't know, but the supply of H100 was actually smoothed out over the year so that they didn't have like a big spike in deliveries and then a quarter less, right? Which pissed a lot of people, mind you. who bought a lot of them.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

4078.096

Some of them even haven't received their order from last year. And they already see like the new chip, the B200, and then the one after, you know, and they're super pissed. There will be a downslope at some point. The question is, you know, when, how, like if there's like the H100 bubble, of course it will impact Nvidia. But Blackwell is, I'm probably going to get a lot of flack for this.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

4099.914

But, you know, I've seen some very worrying numbers about it. and varying testimonies about people who operate these things, right? So that ride will stop or at least, you know, slow down.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

432.244

That is very much substantial. Now, the problem is getting some AMD GPUs, right? I'm really sorry.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

442.628

So there's a few reasons. Probably the most important one is the PyTorch CUDA, I would say, Duo. And that's very, very hard to break. These two are very much intertwined. Can you just explain to us what PyTorch and CUDA are? Oh, yes, absolutely. Yeah, yeah. PyTorch is the ML framework that people use to build actually trained models, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

464.378

You can do inference with it, but by far the most successful framework for training is PyTorch. And PyTorch was very much built on top of CUDA, which is NVIDIA software, right? Let's just say the strengths of PyTorch make it ultimately very, very bound to CUDA. So of course it runs on, you know, it runs on AMD, it runs on, you know, even Apple and so on.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

489.353

But there was always, you know, the tens of little details that not exactly run like, you know, you would expect and there's work involved, but then also there's supply. So probably that's the number one thing. The second thing is there's a lot of GPUs on the market. Pretty much all of them are NVIDIA.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

508.146

The reason being that if you think, you know, in layers and you say, all right, I'm going to buy, let's say, GPUs and I'm going to sell them to folks to maybe not even do training, right? Just do inference. Then most likely, if you look at it that way, you'll end up buying Nvidia because everybody will want to run an Nvidia because nobody knows really how to do whatever.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

528.177

And they've trained on Nvidia, so they're like, I can just reuse my code and so on. So there's like this self-perpetuating circle of people just buy Nvidia because they want to resell and people just use Nvidia because it's there, right? But it's by far not the most efficient platform. And arguably, even in terms of software, it's not the best software platform.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

572.279

Because the chips are there. There's a lot of things, but in my opinion, there's going to be a need for inference. Very hard to say whether it will be worth everybody's money to do it on H100. That is a bubble that I think will blow some time. I'm kind of afraid of that, to be honest.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

593.252

because it was built on the A100, I would say, financial model, which was at generation zero, we do training, but when it's last generation, we do inference, and it worked beautifully, right? For A100, then H100 comes along, and inference is, it's worth five times the price, and it maybe runs twice in terms of performance, on inference, that is. On training, it's a lot better, but on inference,

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

619.016

It's like maybe twice as fast. Actually, when it came out, it ran at the same speed than the A100. So there's a money gap that's going to have to, you know, be bridged sometime, right? And the part that worries me is that I see, you know, amortization plans in like, you know, six, seven years, right? With the GPUs at the collateral.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

638.729

And I'm like, well, I'm not sure how it's going to work because at least when they came out, they were worth five times the price and they're just two times, you know, faster. Something has got to give.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

659.966

Not much, ultimately. The two things that could very much shake the industry, the chip industry, in my opinion, are agents and reasoning.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

674.358

I think this is where NVIDIA can be attacked. I mean, why agents and why reasoning? The difference is for agents and reasoning, you need to wait until the end of the request to get whatever it is you came for. You don't really care about the speed at which the text outputs, which is what you want in a chat, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

695.387

You only care about how much time does it take between the beginning of my request and the end. And so that fundamentally changes the incentives from throughput bound to latency bound. And so GPUs, let's say you're running a GPUs at, let's say, 10,000 tokens per second. You very much like to do it, you know, 100 times 100, right?

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

718.772

And they can do that, but they cannot give you 10,000 tokens per second only on you, per stream, what we say. But in terms of agents or reasoning, this is exactly what you want, because you don't want to wait like 50 seconds for whatever thinking, right? And agents, it's the same. So these two, I think, are the shot that might make NVIDIA change its course with respect to chips.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

743.621

I mean, they're not idiots, right? How should agents change NVIDIA's strategy? Hard to say, because NVIDIA has a very, very vertical approach. They do more of more, right? Like if you look at Blackwell, it's actually crazy what they did for Blackwell. They assembled two chips.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

763.226

But the surface was so big that the chip started to bend a bit, which further perpetuated the problem because it then didn't make contact with the heat sink and so on. So they are very much, and you know, the power envelope, they push it to a thousand watts. It requires liquid cooling and so on. So they are very much in a very vertical foot to the pedal in terms of GPU scaling.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

785.48

But the thing is, GPUs are a good trick for AI, but they're not built for AI. It's not a specialized chip. It is a specialization of a GPU, but it is not an AI chip.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

806.495

So the way it worked is that you can think of a screen as a matrix. And if you have to render pixels on a screen, there's a lot of pixels and everything has to happen in parallel, right? So that you don't waste time. Turns out, you know, matrices are a very important thing in AI.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

825.808

So there was this cool trick in which we essentially tricked the GPU into, back, that was like probably 20 years ago, we would trick the GPU into believing it was doing graphics rendering, where actually we were making it do parallel work, right? It was called GPGPU at the time, right? So it was always a cool trick. But it was not dedicated for this.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

846.65

The pioneers probably were, of course, Google with TPU, which are very much more advanced on the architectural level. But essentially, the way they work, it kind of works for AI. But for LLMs, that starts to crack because they're so big and there's a lot of memory transfers and so on.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

866.724

Actually, that's why Grok achieves, not Grok, but Grok, Cerebras, and all these folks, they achieve very high performance single stream is because the data is right in the chip. They don't have to get it from memory, which is slow, which GPU has to do. So there's a lot of these things that ultimately make it a good trick but not a dedicated solution per se.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

890.454

That said, though, the reason probably NVIDIA won, at least in the training space, is because of Mellanox, not because of the raw compute. Because you need to run lots of these GPUs in parallel. So the interconnect between them is ultimately what matters. How fast can they exchange data? Because remember,

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

912.674

When you do a matrix multiplication, let's say, you read, the matrix is read like hundreds of times during the multiplication. So there's a lot of transfers going on. And so far, Melanox with, you know, InfiniBand had the best technology. So that's why, you know, a lot of people, and when you do training, by the way, it is the name of the game, the interconnect. When you do inference, not so much.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

959.628

FRANCESC CAMPOY- So I would divide it in two categories. Well, three categories. The GPUs you can buy or rent. the TPUs you can rent, and the TPUs you can buy. This is how the market is structured today, right? Right now, if you want to go dedicated, at least in the cloud, there's two options, TPUs and Tranium. TPUs on Google, Tranium on Amazon.

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: Why Google Will Win the AI Arms Race & OpenAI Will Not | NVIDIA vs AMD: Who Wins and Why | The Future of Inference vs Training | The Economics of Compute & Why To Win You Must Have Product, Data & Compute with Steeve Morin @ ZML

985.021

So these are available chips, you can rent them today. If you want to buy GPUs or rent GPUs, they're GPUs, we know it all the time. And there's this new wave of computing, which are... dedicated chips, you can actually buy. The 10th Torrent, the Etched, the Vsora. So I think it will be a mix of, for instance, let's say you are in Google Cloud, of course you don't want to do NVIDIA.