Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

11074.347 - 11089.186 Dylan Patel

That's what the whole field of RL is around is learning from sparse rewards. And the same thing has played out in math where it's like very weak models that sometimes generate answers where you see research already that you can boost their math scores. You can do this sort of RL training

0
💬 0

Comments

There are no comments yet.

Log in to comment.