Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

7946.505 - 7966.463 Aman Sanger

Yeah, so RLHF is when the reward model you use is trained from some labels you've collected from humans giving feedback. I think this works if you have the ability to get a ton of human feedback for this kind of task that you care about.

💬 0

Comments

There are no comments yet.

Back to full episode

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

Comments

Login Required