Accidental Tech Podcast

624: Do Less Math in Computers

4771.592 - 4791.867 Casey Liss

Humans are in the loop to help guide the model, navigate difficult choices where rewards weren't obvious, etc., RLHF, or reinforcement learning from human feedback, was the key innovation in transforming GPT-3 into chat GPT, with well-formed paragraphs, answers that were concise and didn't trail off into gibberish, etc. R10, however, drops the HF, the human feedback part.

💬 0

Comments

There are no comments yet.

Back to full episode

Accidental Tech Podcast

624: Do Less Math in Computers

Comments

Login Required