Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
And this is in the case of CursorTab, picking between two possible generations of what is the better one. And then it just needs a little bit of human nudging with only on the order of 50, 100 examples to kind of align that prior the model has with exactly what you want. It looks different than I think normal RLHF where you're usually training these reward models on tons of examples.
0
💬
0
Comments
Log in to comment.
There are no comments yet.