Menu
Sign In Pricing Add Podcast

Lex Fridman Podcast

#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

2535.846 - 2557.183 Nathan Lambert

When you're training a model, you're going to have all these all-reduces and all-gathers. Between each layer, between the multi-layer perceptron or feed-forward network and the attention mechanism, you'll have basically the model synchronized. Or you'll have all-reducer and all-gather. And this is a communication between all the GPUs in the network, whether it's in training or inference.

0
💬 0

Comments

There are no comments yet.

Log in to comment.