Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

1462.185 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

1481.355 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

1481.355 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

In some of DeepSeq's earlier papers, they talk about their training data being distilled for math. I shouldn't use this word yet, but taken from Common Crawl. And that's a public access that anyone listening to this could go download data from the Common Crawl website. This is a crawler that is maintained publicly.

1481.355 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

1498.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

1498.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yes, other tech companies eventually shift to their own crawler, and DeepSeq likely has done this as well as most frontier labs do. But this sort of data is something that people can get started with. And you're just predicting text in a series of documents. This can be scaled to be very efficient.

1498.85 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.

1517.763 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.

1517.763 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And there's a lot of numbers that are thrown around in AI training, like how many floating point operations or flops are used. And then you can also look at how many hours of these GPUs that are used. And it's largely one loss function taken to a very large amount of compute usage, you just you set up really efficient systems. And then at the end of that, you have the space model.

1517.763 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.

1542.345 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.

1542.345 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And pre training is where there is a lot more of complexity in terms of how the process is emerging or evolving and the different types of training losses that you will use. I think this is a lot of techniques grounded in the natural language processing literature. The oldest technique which is still used today is something called instruction tuning or also known as supervised fine tuning.

1542.345 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this

1566.923 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this

1566.923 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

These acronyms will be IFT or SFT. People really go back and forth throughout them and I will probably do the same which is where you add this

1566.923 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.

1576.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.

1576.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

formatting to the model where it knows to take a question that is like, explain the history of the Roman Empire to me, or a sort of question you'll see on Reddit or Stack Overflow, and then the model will respond in a information-dense but presentable manner. The core of that formatting is in this instruction tuning phase.

1576.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped

1596.508 View full episode →

Appearances Over Time

Podcast Appearances

Login Required