Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.
0
💬
0
Comments
Log in to comment.
There are no comments yet.