Accidental Tech Podcast

624: Do Less Math in Computers

4301.815 - 4324.126 Casey Liss

MOE splits the model into multiple quote-unquote experts and only activates the ones that are necessary. GPT-4 was an MOE model that was believed to have 16 experts with approximately 110 billion parameters each. DeepSeq MLA, multi-head latent attention is the MLA there, was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required.

💬 0

Comments

There are no comments yet.

Back to full episode

Accidental Tech Podcast

624: Do Less Math in Computers

Comments

Login Required