Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

17394.615 - 17418.447 Dylan Patel

I know I hear from startups and businesses, they're like, okay, I can take this post-training and try to apply it to my domain. We talk about verifiers a lot. We use this idea, which is reinforcement learning with verifiable domain rewards, RLVR, kind of similar to RLHF. And we applied it to math and the model today, which is like we applied it to the Lama 405B base model from last year.

0
💬 0

Comments

There are no comments yet.

Log in to comment.