Menu
Sign In Add Podcast

Lex Fridman Podcast

#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America

7295.697 - 7322.752 Arvid Lundmark

And so OpenAI had some preliminary paper on this, I think, last summer, where they use human labelers to get this pretty large, several hundred thousand data set of grading chains of thought. Ultimately, it feels like I haven't seen anything interesting in the ways that people use process reward models outside of just using it as a means of affecting how we choose between a bunch of samples.

0
💬 0

Comments

There are no comments yet.

Log in to comment.