Zico Colter
👤 PersonAppearances Over Time
Podcast Appearances
The way we think about fixing this normally is, you know, we would have certain models, models that we release. We would say, you know, don't use your model ability that you have inside of you to sort of create obvious cyber attacks against certain infrastructure and stuff like this, right? Don't do that. But we can't make them follow that instruction, right?
The way we think about fixing this normally is, you know, we would have certain models, models that we release. We would say, you know, don't use your model ability that you have inside of you to sort of create obvious cyber attacks against certain infrastructure and stuff like this, right? Don't do that. But we can't make them follow that instruction, right?
The way we think about fixing this normally is, you know, we would have certain models, models that we release. We would say, you know, don't use your model ability that you have inside of you to sort of create obvious cyber attacks against certain infrastructure and stuff like this, right? Don't do that. But we can't make them follow that instruction, right?
Someone either with access to a model itself, certainly, but even with access sometimes to a closed source model just can have the ability to jailbreak these things and oftentimes get access to these things, right? To be very clear, we are making immense progress in solving this problem of sort of preventing jailbreaks, kind of avoiding making sure models follow a spec.
Someone either with access to a model itself, certainly, but even with access sometimes to a closed source model just can have the ability to jailbreak these things and oftentimes get access to these things, right? To be very clear, we are making immense progress in solving this problem of sort of preventing jailbreaks, kind of avoiding making sure models follow a spec.
Someone either with access to a model itself, certainly, but even with access sometimes to a closed source model just can have the ability to jailbreak these things and oftentimes get access to these things, right? To be very clear, we are making immense progress in solving this problem of sort of preventing jailbreaks, kind of avoiding making sure models follow a spec.
But until we can solve this problem, it's very hard to say, you know, all the other dangerous capabilities that AI could sort of demonstrate become much, much more concerning. And so this is kind of a multiplier effect on everything else bad these models can do, which is why I'm so concerned about it right now.
But until we can solve this problem, it's very hard to say, you know, all the other dangerous capabilities that AI could sort of demonstrate become much, much more concerning. And so this is kind of a multiplier effect on everything else bad these models can do, which is why I'm so concerned about it right now.
But until we can solve this problem, it's very hard to say, you know, all the other dangerous capabilities that AI could sort of demonstrate become much, much more concerning. And so this is kind of a multiplier effect on everything else bad these models can do, which is why I'm so concerned about it right now.
So that's sort of a good lead in, right? Because if jailbreaks and sort of manipulation of models is the attack vector, what is the payoff? What are the things we can do? And here, what we're trying to do really is we're trying to assess the core harmful capabilities of models, right? And people have thought a lot about this, right?
So that's sort of a good lead in, right? Because if jailbreaks and sort of manipulation of models is the attack vector, what is the payoff? What are the things we can do? And here, what we're trying to do really is we're trying to assess the core harmful capabilities of models, right? And people have thought a lot about this, right?
So that's sort of a good lead in, right? Because if jailbreaks and sort of manipulation of models is the attack vector, what is the payoff? What are the things we can do? And here, what we're trying to do really is we're trying to assess the core harmful capabilities of models, right? And people have thought a lot about this, right?
People think about things like creating chemical weapons, creating biological weapons, creating cyber attacks. Personally, I think cyber attacks are a much more clear and present threat than, for example, bio threats and things like this. At the same time, I don't want to dismiss any of these concerns, right?
People think about things like creating chemical weapons, creating biological weapons, creating cyber attacks. Personally, I think cyber attacks are a much more clear and present threat than, for example, bio threats and things like this. At the same time, I don't want to dismiss any of these concerns, right?
People think about things like creating chemical weapons, creating biological weapons, creating cyber attacks. Personally, I think cyber attacks are a much more clear and present threat than, for example, bio threats and things like this. At the same time, I don't want to dismiss any of these concerns, right?
I think people have looked at this much, much more than myself and are very concerned about these things. So I want to treat this with the respect that honestly it deserves because these are sort of massive problems. There are a lot of potential harms of AI models. Some are associated primarily with scale and things like this, like the misinformation you mentioned.
I think people have looked at this much, much more than myself and are very concerned about these things. So I want to treat this with the respect that honestly it deserves because these are sort of massive problems. There are a lot of potential harms of AI models. Some are associated primarily with scale and things like this, like the misinformation you mentioned.
I think people have looked at this much, much more than myself and are very concerned about these things. So I want to treat this with the respect that honestly it deserves because these are sort of massive problems. There are a lot of potential harms of AI models. Some are associated primarily with scale and things like this, like the misinformation you mentioned.
But some are just, there are capabilities that we think these models might enable where they would lower the bar so much for some bad things, like, say, creating a zero-day exploit that takes down software over half the world. The concern is that, not that they can do this sort of autonomously, maybe initially, but
But some are just, there are capabilities that we think these models might enable where they would lower the bar so much for some bad things, like, say, creating a zero-day exploit that takes down software over half the world. The concern is that, not that they can do this sort of autonomously, maybe initially, but