Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Dylan Patel

👤 Person
1122 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And then there's two other categories of loss functions that are being used today. One I will classify as preference fine tuning. Preference fine tuning is a generalized term for what came out of reinforcement learning from human feedback, which is RLHF. This reinforcement learning from human feedback is credited as the technique that helped

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

ChatGPT breakthrough is a technique to make the responses that are nicely formatted, like these Reddit answers, more in tune with what a human would like to read. This is done by collecting pairwise preferences from actual humans out in the world to start. And now AIs are also labeling this data and we'll get into those trade-offs.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And you have this kind of contrastive loss function between a good answer and a bad answer. And the model learns to pick up these trends. There's different implementation ways. You have things called reward models. You could have direct alignment algorithms. There's a lot of really specific things you can do, but all of this is about fine tuning to human preferences.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

And the final stage is much newer and will link to what is done in R1. And these reasoning models is, I think, OpenAI's name for this. They had this new API in the fall, which they called the Reinforcement Fine Tuning API. This is the idea that you use the techniques of reinforcement learning, which is a whole framework of AI. There's a deep literature here.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

To summarize, it's often known as trial and error learning or the subfield of AI where you're trying to make sequential decisions in a certain potentially noisy environment. There's a lot of ways we could go down that. but fine tuning language models where they can generate an answer and then you check to see if the answer matches the true solution.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

For math or code, you have an exactly correct answer for math. You can have unit tests for code. And what we're doing is we are checking the language models work and we're giving it multiple opportunities on the same questions to see if it is right. And if you keep doing this, the models can learn to improve in verifiable domains. to a great extent. It works really well.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.

Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

It's a newer technique in the academic literature. It's been used at frontier labs in the US that don't share every detail for multiple years. So this is the idea of using reinforcement learning with language models, and it has been taking off, especially in this deep-seek moment.