Menu
Sign In Add Podcast

Lex Fridman Podcast

#447 – Cursor Team: Future of Programming with AI

8013.983 - 8040.502 Aman Sanger

And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.

0
💬 0

Comments

There are no comments yet.

Log in to comment.