Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Dylan Patel
That's what the whole field of RL is around is learning from sparse rewards. And the same thing has played out in math where it's like very weak models that sometimes generate answers where you see research already that you can boost their math scores. You can do this sort of RL training
0
💬
0
Comments
Log in to comment.
There are no comments yet.