Zico Colter
👤 PersonAppearances Over Time
Podcast Appearances
This is actually a very nuanced question. Do we have AI products that are able to be maximally used by workforces? The answer to this right now is no. Clearly, there is a gap between what people could use these things for and what they're using them for right now.
This is actually a very nuanced question. Do we have AI products that are able to be maximally used by workforces? The answer to this right now is no. Clearly, there is a gap between what people could use these things for and what they're using them for right now.
I mean, I find this kind of interesting in a way because enterprises are all very happy to put their data in the cloud. They all use cloud services to store their data. But then, oh, train on this there? No, no, no. Can't do that. I think a lot of it comes, honestly, from kind of a misunderstanding about how this process works.
I mean, I find this kind of interesting in a way because enterprises are all very happy to put their data in the cloud. They all use cloud services to store their data. But then, oh, train on this there? No, no, no. Can't do that. I think a lot of it comes, honestly, from kind of a misunderstanding about how this process works.
I mean, I find this kind of interesting in a way because enterprises are all very happy to put their data in the cloud. They all use cloud services to store their data. But then, oh, train on this there? No, no, no. Can't do that. I think a lot of it comes, honestly, from kind of a misunderstanding about how this process works.
Also, frankly speaking, I think it has to do with the fact that if you think about the model of just taking all your internal data and dumping it into a large language model, this is not tenable. You can't do this for a number of reasons. The most obvious one being the data has access rights, right? Not everyone gets access to all the data.
Also, frankly speaking, I think it has to do with the fact that if you think about the model of just taking all your internal data and dumping it into a large language model, this is not tenable. You can't do this for a number of reasons. The most obvious one being the data has access rights, right? Not everyone gets access to all the data.
Also, frankly speaking, I think it has to do with the fact that if you think about the model of just taking all your internal data and dumping it into a large language model, this is not tenable. You can't do this for a number of reasons. The most obvious one being the data has access rights, right? Not everyone gets access to all the data.
And the default mode of language models is that if you train on some data, you can probably get it back out of the system if you want to enough. And so this doesn't work with the sort of the access controls people have in traditional data. I think these are kind of the concerns. Now, to be clear, there are very easy ways around this, right?
And the default mode of language models is that if you train on some data, you can probably get it back out of the system if you want to enough. And so this doesn't work with the sort of the access controls people have in traditional data. I think these are kind of the concerns. Now, to be clear, there are very easy ways around this, right?
And the default mode of language models is that if you train on some data, you can probably get it back out of the system if you want to enough. And so this doesn't work with the sort of the access controls people have in traditional data. I think these are kind of the concerns. Now, to be clear, there are very easy ways around this, right?
So this is probably why RAG-based systems are so common here and probably will remain, even with the advent of fine-tuning availability, they're going to remain a useful paradigm. RAG is, for those that maybe haven't heard the term, it's retrieval augmented generation. It basically means that you just
So this is probably why RAG-based systems are so common here and probably will remain, even with the advent of fine-tuning availability, they're going to remain a useful paradigm. RAG is, for those that maybe haven't heard the term, it's retrieval augmented generation. It basically means that you just
So this is probably why RAG-based systems are so common here and probably will remain, even with the advent of fine-tuning availability, they're going to remain a useful paradigm. RAG is, for those that maybe haven't heard the term, it's retrieval augmented generation. It basically means that you just
You go out and fetch the data you can access, that you have access rights to, that is relevant to your question. You inject it all into the context of the model, and then you answer the question based upon this data here too. These RAG-based techniques are going to remain popular precisely because they respect normal data access procedures.
You go out and fetch the data you can access, that you have access rights to, that is relevant to your question. You inject it all into the context of the model, and then you answer the question based upon this data here too. These RAG-based techniques are going to remain popular precisely because they respect normal data access procedures.
You go out and fetch the data you can access, that you have access rights to, that is relevant to your question. You inject it all into the context of the model, and then you answer the question based upon this data here too. These RAG-based techniques are going to remain popular precisely because they respect normal data access procedures.
I sort of feel like a lot of this hesitancy actually comes from a fundamental misunderstanding of how these models are working. People think that if you have ChatGPT answer a question about any of your data, that data is somehow being trained upon and merged into the model, whether it's an API call or whether it's a RAG-based call or anything else. And it's just not true.
I sort of feel like a lot of this hesitancy actually comes from a fundamental misunderstanding of how these models are working. People think that if you have ChatGPT answer a question about any of your data, that data is somehow being trained upon and merged into the model, whether it's an API call or whether it's a RAG-based call or anything else. And it's just not true.
I sort of feel like a lot of this hesitancy actually comes from a fundamental misunderstanding of how these models are working. People think that if you have ChatGPT answer a question about any of your data, that data is somehow being trained upon and merged into the model, whether it's an API call or whether it's a RAG-based call or anything else. And it's just not true.