Lex Fridman Podcast
#447 – Cursor Team: Future of Programming with AI
Aman Sanger
So like what people do in all these papers is they sample a bunch of outputs from the language model and then use the process reward models to grade all those generations alongside maybe some other heuristics and then use that to choose the best answer. The really interesting thing that people think might work and people want to work is tree search with these process reward models.
0
💬
0
Comments
Log in to comment.
There are no comments yet.