Dylan Patel
👤 PersonAppearances Over Time
Podcast Appearances
I think we'll make sure we want to go down the license rabbit hole before we do specifics.
I think we'll make sure we want to go down the license rabbit hole before we do specifics.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
Yeah, especially in the DeepSeq v3, which is their pre-training paper. They were very clear that they are doing interventions on the technical stack that go at many different levels. For example, to get highly efficient training, they're making modifications at or below the CUDA layer for NVIDIA chips.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
I have never worked there myself, and there are a few people in the world that do that very well, and some of them are at DeepSeq. And these types of people are... at DeepSeek and leading American frontier labs, but there are not many places.
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.
Yeah, so these weights that you can download from Hugging Face or other platforms are very big matrices of numbers. You can download them to a computer in your own house that has no internet and you can run this model and you're totally in control of your data.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
That is something that is different than how a lot of language model usage is actually done today, which is mostly through APIs, where you send your prompt to GPUs run by certain companies. And these companies will have different distributions and policies on how your data is stored, if it is used to train future models, where it is stored, if it is encrypted, and so on.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
So the open weights are, you have your fate of data in your own hands. And that is something that is deeply connected to the soul of open source computing.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.
Yes. So for one, I have very understanding of many people being confused by these two model names. So I would say the best way to think about this is that when training a language model, you have what is called pre-training, which is when you're predicting the large amounts of mostly internet text. You're trying to predict the next token.