Lex Fridman Podcast
#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters
Dylan Patel
So there's extremely long sequences of the letter M. And then the comments are like, beep, beep, because it's in the microwave ends. But if you pass this into a model that's trained to be a normal producing text, it's extremely high loss. Because normally you see an M. You don't predict M's for a long time. So this is something that causes a lot of spikes for us.
0
💬
0
Comments
Log in to comment.
There are no comments yet.