Menu
Sign In Add Podcast

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7946.505 - 7966.463 Aman Sanger

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

0
💬 0

Comments

There are no comments yet.

Log in to comment.