Accidental Tech Podcast
624: Do Less Math in Computers
Casey Liss
Humans are in the loop to help guide the model, navigate difficult choices where rewards weren't obvious, etc., RLHF, or reinforcement learning from human feedback, was the key innovation in transforming GPT-3 into chat GPT, with well-formed paragraphs, answers that were concise and didn't trail off into gibberish, etc. R10, however, drops the HF, the human feedback part.
0
💬
0
Comments
Log in to comment.
There are no comments yet.