Dylan Patel

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.

1343.597 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.

1343.597 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And what to know about these new DeepSeq models is that they do this internet large-scale pre-training once to get what is called DeepSeq v3 base. This is a base model. It's just going to finish your sentences for you. It's going to be harder to work with than ChatGPT.

1343.597 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.

1360.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.

1360.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then what DeepSeek did is they've done two different post-training regimes to make the models have specific desirable behaviors. So what is the more normal model in terms of the last few years of AI, an instruct model, a chat model, a quote-unquote aligned model, a helpful model? There are many ways to describe this. is more standard post-training.

1360.309 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.

1382.541 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.

1382.541 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So this is things like instruction tuning, reinforced learning from human feedback. We'll get into some of these words. And this is what they did to create the DeepSeq v3 model. This was the first model to be released, and it is very high-performance. It's competitive with GPT-4, LAMA-405b, so on.

1382.541 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

1402.333 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

1402.333 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then when this release was happening, we don't know their exact timeline or soon after they were finishing the training of a different training process from the same next token prediction based model that I talked about, which is when this new reasoning training that people have heard about comes in in order to create the model that is called DeepSeq R1.

1402.333 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

1423.007 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

1423.007 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

The R through this conversation is good for grounding for reasoning, and the name is also similar to OpenAI's O1, which is the other reasoning model that people have heard about. And we'll have to break down the training for R1 in more detail because for one, we have a paper detailing it, but also it is a far newer set of techniques for the AI community.

1423.007 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

1442.482 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

1442.482 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

So it's a much more rapidly evolving area of research.

1442.482 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

1462.185 View full episode →

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Yeah, so pre-training, I'm using some of the same words to really get the message across is you're doing what is called autoregressive prediction to predict the next token in a series of documents. This is done over standard practices, trillions of tokens. So this is a ton of data that is mostly scraped from the web.

1462.185 View full episode →

Appearances Over Time

Podcast Appearances

Login Required