Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Nathan Lambert
When people are training, they have all these various dashboards, but like the most simple one is your loss, right? And it continues to go down. But in reality, especially with more complicated stuff like MOE, the biggest problem with it or FP8 training, which is another innovation, you know, going to a lower precision number format, i.e. less accurate, is that you end up with loss spikes.
0
💬
0
Comments
Log in to comment.
There are no comments yet.