Simon Willison

Predictions 2025

I think it's a very productive way of getting these machines to solve any problem where you can have automated feedback and where the negative situation isn't it spending all of your money on flights to Brazil or whatever. That feels sensible to me.

1579.131 View full episode →

Oxide and Friends

Predictions 2025

I think it's a very productive way of getting these machines to solve any problem where you can have automated feedback and where the negative situation isn't it spending all of your money on flights to Brazil or whatever. That feels sensible to me.

1579.131 View full episode →

Oxide and Friends

Predictions 2025

That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.

1626.396 View full episode →

Oxide and Friends

Predictions 2025

That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.

1626.396 View full episode →

Oxide and Friends

Predictions 2025

But it did kind of work, you know?

1649.943 View full episode →

Oxide and Friends

Predictions 2025

But it did kind of work, you know?

1649.943 View full episode →

Oxide and Friends

Predictions 2025

Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.

1696.093 View full episode →

Oxide and Friends

Predictions 2025

Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.

1696.093 View full episode →

Oxide and Friends

Predictions 2025

It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.

1712.908 View full episode →

Oxide and Friends

Predictions 2025

It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.

1712.908 View full episode →

Oxide and Friends

Predictions 2025

So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.

1733.579 View full episode →

Oxide and Friends

Predictions 2025

So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.

1733.579 View full episode →

Oxide and Friends

Predictions 2025

Right. So what's not to love about seeing your laptop just do that on its own?

1749.424 View full episode →

Oxide and Friends

Predictions 2025

Right. So what's not to love about seeing your laptop just do that on its own?

1749.424 View full episode →

Oxide and Friends

Predictions 2025

It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.

1837.485 View full episode →

Oxide and Friends

Predictions 2025

It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.

1837.485 View full episode →

Oxide and Friends

Predictions 2025

The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.

1891.461 View full episode →

Oxide and Friends

Predictions 2025

The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.

1891.461 View full episode →

Oxide and Friends

Predictions 2025

I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.

1957.548 View full episode →

Oxide and Friends

Predictions 2025

I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.

1957.548 View full episode →

Appearances Over Time

Podcast Appearances

Login Required