Coder Radio
589: Blame the Tools using the Tools
Mark Zuckerberg
So that leads to this dynamic where Llama 3, we could train on 10,000 to 20,000 GPUs. Llama 4, we could train on more than 100,000 GPUs. Llama 5, we can plan to scale even further. And there's just an interesting question of how far that goes. It's totally possible.
0
💬
0
Comments
Log in to comment.
There are no comments yet.