Simon Willison
👤 PersonAppearances Over Time
Podcast Appearances
Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.
Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.
And that's pretty, that's very meaningful that that's the case. Likewise, the ones that run on my laptop, Two years ago, I was running the first Lama model, and it was not quite as good as GPT-3.5. It just about worked. Same hardware today. I've not upgraded the memory or anything. It's now running a GPT-4 class model.
And that's pretty, that's very meaningful that that's the case. Likewise, the ones that run on my laptop, Two years ago, I was running the first Lama model, and it was not quite as good as GPT-3.5. It just about worked. Same hardware today. I've not upgraded the memory or anything. It's now running a GPT-4 class model.
There was so much low-hanging fruit for optimization for these things, and I think there's probably still quite a lot left. But it's pretty extraordinary. Oh, here's my favorite number for this. Google Gemini Flash 8B, which is Google's cheapest of the Gemini models. And it's still a vision audio model. You can pipe audio and images into it and get responses.
There was so much low-hanging fruit for optimization for these things, and I think there's probably still quite a lot left. But it's pretty extraordinary. Oh, here's my favorite number for this. Google Gemini Flash 8B, which is Google's cheapest of the Gemini models. And it's still a vision audio model. You can pipe audio and images into it and get responses.
If I was to run that against 68,000 photographs in my personal photo collection to generate captions, it would cost me less than $2 to do 68,000 photos. Which is completely nonsensical.
If I was to run that against 68,000 photographs in my personal photo collection to generate captions, it would cost me less than $2 to do 68,000 photos. Which is completely nonsensical.
Like you, I'm not nearly brave enough to shorten NVIDIA, but at the same time, I don't understand how being able to do matrix multiplication at scale is a moat. You know, I just don't. You're hardware people, I'm not. So maybe I'm missing something. But it feels like all of this stuff comes down to who can multiply matrices the faster. Are NVIDIA really, like, so far ahead of everybody else?
Like you, I'm not nearly brave enough to shorten NVIDIA, but at the same time, I don't understand how being able to do matrix multiplication at scale is a moat. You know, I just don't. You're hardware people, I'm not. So maybe I'm missing something. But it feels like all of this stuff comes down to who can multiply matrices the faster. Are NVIDIA really, like, so far ahead of everybody else?
You've got Cerebras and Grok have been doing incredible things recently. Apple's, like, Apple Silicon can run matrix multiplications incredibly quick. Where is NVIDIA's moat here, other than CUDA being really difficult to get away from?
You've got Cerebras and Grok have been doing incredible things recently. Apple's, like, Apple Silicon can run matrix multiplications incredibly quick. Where is NVIDIA's moat here, other than CUDA being really difficult to get away from?
So I've got a self-serving three-year prediction. I think somebody is going to perform a piece of Pulitzer Prize-winning investigative journalism using AI and LLMs as part of the tooling that they used for that report. And I partly wanted to raise this one, partly because my day job that I have assigned myself is building software to help journalists do this kind of work.
So I've got a self-serving three-year prediction. I think somebody is going to perform a piece of Pulitzer Prize-winning investigative journalism using AI and LLMs as part of the tooling that they used for that report. And I partly wanted to raise this one, partly because my day job that I have assigned myself is building software to help journalists do this kind of work.
But more importantly, I think it's illustrative of the larger concept that I think AI assistance in that kind of information work will almost be expected. Like, I think it won't be surprising when you hear that somebody achieved a great piece of like, in this case, it's sort of combining research with journalism and so forth.
But more importantly, I think it's illustrative of the larger concept that I think AI assistance in that kind of information work will almost be expected. Like, I think it won't be surprising when you hear that somebody achieved a great piece of like, in this case, it's sort of combining research with journalism and so forth.
Pieces of work done like that where an LLM was part of the mix feels like it's not even going to be surprising anymore.
Pieces of work done like that where an LLM was part of the mix feels like it's not even going to be surprising anymore.
And more specifically, the angle here is like this is actually possible today. Like if you think about what investigative journalism, any kind of deep research often involves going through tens of thousands of sources of information and trying to make sense of those. And that's a lot of work, right? That's a lot of trudging through documents.
And more specifically, the angle here is like this is actually possible today. Like if you think about what investigative journalism, any kind of deep research often involves going through tens of thousands of sources of information and trying to make sense of those. And that's a lot of work, right? That's a lot of trudging through documents.