Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2764.795 - 2783.762 Dylan Patel

innovations in DeepSeq's architecture is that they changed the routing mechanism in mixture of expert models. There's something called an auxiliary loss, which effectively means during training, you want to make sure that all of these experts are used across the tasks that the model sees. Why there can be failures in mixture of experts is that

0
💬 0

Comments

There are no comments yet.

Log in to comment.