Lex Fridman Podcast
#446 – Ed Barnhart: Maya, Aztec, Inca, and Lost Civilizations of South America
Arvid Lundmark
minimizing the KL divergence with the distribution of gamma 27B, right? So knowledge distillation there. And you're spending the compute of literally training this 27 billion model, billion parameter model on all these tokens just to get out this, I don't know, smaller model.
0
💬
0
Comments
Log in to comment.
There are no comments yet.