Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI
Aman Sanger
Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.
0
💬
0
Comments
Log in to comment.
There are no comments yet.