Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
So if you're doing some math problem, let's look at that final thing you've done, everything, and let's assign a grade to it, how likely we think, like what's the reward for this outcome. Process reward models instead try to grade the chain of thought.
0
💬
0
Comments
Log in to comment.
There are no comments yet.