Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

3246.792 - 3264.064 Dylan Patel

So there's extremely long sequences of the letter M. And then the comments are like, beep, beep, because it's in the microwave ends. But if you pass this into a model that's trained to be a normal producing text, it's extremely high loss. Because normally you see an M. You don't predict M's for a long time. So this is something that causes a lot of spikes for us.

💬 0

Comments

There are no comments yet.

Back to full episode

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Comments

Login Required