Menu
Sign In Pricing Add Podcast

Accidental Tech Podcast

624: Do Less Math in Computers

4301.815 - 4324.126 Casey Liss

MOE splits the model into multiple quote-unquote experts and only activates the ones that are necessary. GPT-4 was an MOE model that was believed to have 16 experts with approximately 110 billion parameters each. DeepSeq MLA, multi-head latent attention is the MLA there, was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required.

0
💬 0

Comments

There are no comments yet.

Log in to comment.