Menu
Sign In Add Podcast

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7261.468 - 7281.73 Arvid Lundmark

Yeah. So one thing to do would be, I think you probably need to train a process reward model, which is, so maybe we can get into reward models and outcome reward models versus process reward models. Outcome reward models are the kind of traditional reward models that people are trained for language modeling. And it's just looking at the final thing.

0
💬 0

Comments

There are no comments yet.

Log in to comment.