Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.
0
💬
0
Comments
Log in to comment.
There are no comments yet.