Simon Willison

Oxide and Friends

And a friend of mine, the first thing they tried was they made a webpage that just said, download and run this executable. And That was all it took, and it was malware, and Claude saw the web page, downloaded the executable, installed it and ran the malware, and added itself to a botnet. Just instantly.

Oxide and Friends

Predictions 2025

1021.237

Basically, basically. And it's like, I mean, come on, right? That's the single most obvious version of this, and it was the first thing this chap tried, and it just worked, you know? So...

Oxide and Friends

Predictions 2025

1031.986

Yeah, and every time I talk to people at AI labs about this, I got to ask this question of some anthropic people quite recently, and they always talk about how, oh no, we're training it and we're going to get better through training and all of that. And that's just such a cop-out answer. That doesn't work when you're dealing with actual malicious hackers.

Oxide and Friends

Predictions 2025

1061.46

Exactly. So, you know, I feel like there is one aspect of agents that I do believe in for the most part. And that's the research assistant thing. You know, these ones where you say, for hours and hours and hours, find everything you can try and piece things together. I've got access to one. There are a few of those already.

Oxide and Friends

Predictions 2025

1076.217

There's the Google Gemini have something called deep research that I've been playing with. That's pretty good, you know?

Oxide and Friends

Predictions 2025

1092.655

Okay, yeah, interesting. There's some kind of beta that I'm in. I can actually, so I can share one example of something that did for me. So I live in Half Moon Bay. We have lots of pelicans. I love pelicans. I use them in all of my examples and things. And I was curious as to where are, where are the most California brown pelicans in the world?

Oxide and Friends

Predictions 2025

1109.546

And I ran it through Google deep research and it figured out we're number two. We have the second largest mega of brown pelicans. And it gave me a PDF file from an, from a bird group in 2009 who did the survey. And it was, you know, it, it, right, right, right. Yeah. Yeah. Yeah, I'm convinced that it found me the right information. And that's really exciting. Alameda are number one.

Oxide and Friends

Predictions 2025

1132.202

They have the largest mega roost. Oh, my God.

Oxide and Friends

Predictions 2025

1152.536

Point being, the research assistant that goes away and digs up information and gives you back the citations and the quotes and everything, that already works to a certain extent right now. I think that's over the course of the year, I expect that to get really, really good. I think we'll all be using those. The ones that go out and spend money on your behalf, that's ludicrous.

Oxide and Friends

Predictions 2025

1231.582

I hate that one so much that sometimes I call that digital twins, which is an abusive term that actually does exist, right? A digital twin is when you have like a simulation of your hydroelectric cam or whatever. But yeah, it's the biggest pile of bullshit I've ever heard.

Oxide and Friends

Predictions 2025

1247.456

The idea that you can get an LM and give it access to all of your like notes and your emails and stuff that can go and make decisions on your behalf in meetings. Based on being this weird zombie simulation of you?

Oxide and Friends

Predictions 2025

1536.467

To be fair, I think we've had that exact kind of agent for two years almost. ChatGPT code interpreter was the very first version of a thing where ChatGPT writes code, runs it in the Python interpreter, gets the error message, reruns the code. They got that working in March of 2023. And it's kind of weird that other systems are just beginning to do what they've been doing for two years.

Oxide and Friends

Predictions 2025

1562.421

Like some of those sort of things that call themselves agents that are like IDEs and so forth, they're getting to that point. And that pattern just works. And it's pretty safe. You know, you want to be able to... have it run the code in a sandbox so it can't accidentally delete everything on your computer. But sandboxing isn't that difficult these days. So yeah, that I do buy.

Oxide and Friends

Predictions 2025

1579.131

I think it's a very productive way of getting these machines to solve any problem where you can have automated feedback and where the negative situation isn't it spending all of your money on flights to Brazil or whatever. That feels sensible to me.

Oxide and Friends

Predictions 2025

1626.396

That does also tie into the O1, these new inference scaling language models that we're getting. The one that did well in the AGI test, O3, That was basically brute force, right? It tries loads and loads and loads and loads of different potential strategies, solving a puzzle, and it figures out which one works, and it spends a million dollars on electricity to do it.

Oxide and Friends

Predictions 2025

1649.943

But it did kind of work, you know?

Oxide and Friends

Predictions 2025

1696.093

Okay. I've got one thing I do want to recommend for test time compute. I've been calling it inference scaling. It's the same idea. There is a Alibaba model from Quen, their Quen research team, called QWQ, which you can run on your laptop. I've run it on my Mac, and it does the thing.

Oxide and Friends

Predictions 2025

1712.908

It does the give it a puzzle, and it thinks a very... It outputs, like, sometimes dozens of paragraphs of text about how it's thinking before it gets to an answer. And so watching it do that is... incredibly entertaining. But the best thing about it is that occasionally it switches into Chinese. I've had my laptop think out loud in Chinese before it got to an answer.

Oxide and Friends

Predictions 2025

1733.579

So I asked it a question in England, it thought in Chinese for quite a while, and then it gave me an English answer. And that is just delightful.

Oxide and Friends

Predictions 2025

1749.424

Right. So what's not to love about seeing your laptop just do that on its own?

Oxide and Friends

Predictions 2025

1837.485

It is scoring higher than any of the other open weights models. It is also, it's like 685 billion parameters, so it's not easy to run. This needs... data center hardware to run it. But yeah, the benchmarks are all very impressive. It's beating, the previous best one I think was Neta's Lama 405B. This one's what, 685B or something? It's very good.

Oxide and Friends

Predictions 2025

1891.461

The thing that shocks, because DeepSeek have a good reputation. They've released some good models in the past. The fact that they did it for $5.5 million, that's like an 11th of the price of the closest Meta model that Meta have documented their spending on. It's just astonishing. Yeah.

Oxide and Friends

Predictions 2025

1957.548

I mean, one thing I do want to highlight is that last year was the year of inference compute efficiency. Like at the beginning of the year, we had like the open AI models were about literally 100 times less expensive to run a prompt through than they were two and a half years ago.

Oxide and Friends

Predictions 2025

1974.992

Like, all of the providers, they're in this race to the bottom in terms of how much they charge per token, but it's a race based on efficiency. Like, I checked in Google Gemini and Amazon Nova are both the cheapest hosted models, or two of the cheapest, and they're not doing it a loss. They are at least charging you more than it costs them in electricity to run your prompt.

Oxide and Friends

Predictions 2025

1995.083

And that's pretty, that's very meaningful that that's the case. Likewise, the ones that run on my laptop, Two years ago, I was running the first Lama model, and it was not quite as good as GPT-3.5. It just about worked. Same hardware today. I've not upgraded the memory or anything. It's now running a GPT-4 class model.

Oxide and Friends

Predictions 2025

2015.402

There was so much low-hanging fruit for optimization for these things, and I think there's probably still quite a lot left. But it's pretty extraordinary. Oh, here's my favorite number for this. Google Gemini Flash 8B, which is Google's cheapest of the Gemini models. And it's still a vision audio model. You can pipe audio and images into it and get responses.

Oxide and Friends

Predictions 2025

2036.016

If I was to run that against 68,000 photographs in my personal photo collection to generate captions, it would cost me less than $2 to do 68,000 photos. Which is completely nonsensical.

Oxide and Friends

Predictions 2025

2153.689

Like you, I'm not nearly brave enough to shorten NVIDIA, but at the same time, I don't understand how being able to do matrix multiplication at scale is a moat. You know, I just don't. You're hardware people, I'm not. So maybe I'm missing something. But it feels like all of this stuff comes down to who can multiply matrices the faster. Are NVIDIA really, like, so far ahead of everybody else?

Oxide and Friends

Predictions 2025

2175.444

You've got Cerebras and Grok have been doing incredible things recently. Apple's, like, Apple Silicon can run matrix multiplications incredibly quick. Where is NVIDIA's moat here, other than CUDA being really difficult to get away from?

Oxide and Friends

Predictions 2025

266.087

Okay.

Oxide and Friends

Predictions 2025

2689.608

So I've got a self-serving three-year prediction. I think somebody is going to perform a piece of Pulitzer Prize-winning investigative journalism using AI and LLMs as part of the tooling that they used for that report. And I partly wanted to raise this one, partly because my day job that I have assigned myself is building software to help journalists do this kind of work.

Oxide and Friends

Predictions 2025

2710.94

But more importantly, I think it's illustrative of the larger concept that I think AI assistance in that kind of information work will almost be expected. Like, I think it won't be surprising when you hear that somebody achieved a great piece of like, in this case, it's sort of combining research with journalism and so forth.

Oxide and Friends

Predictions 2025

2730.736

Pieces of work done like that where an LLM was part of the mix feels like it's not even going to be surprising anymore.

Oxide and Friends

Predictions 2025

2780.527

And more specifically, the angle here is like this is actually possible today. Like if you think about what investigative journalism, any kind of deep research often involves going through tens of thousands of sources of information and trying to make sense of those. And that's a lot of work, right? That's a lot of trudging through documents.

Oxide and Friends

Predictions 2025

2798.543

If you can use an LLM to review every page of 10,000 pages of police abuse reports. to pull out vital details. It doesn't give you the story, but it gives you the leads. It gives you the leads to know, okay, which of these 10,000 reports should I go and spend my boot investigating?

Oxide and Friends

Predictions 2025

2814.031

But the thing is, you could do that today, but I feel like the knowledge of how to do that is still not at all distributed. people get, these things are very difficult to use, people get very confused about what they're good at, what they're bad at, like will it just hallucinate details at me, all of that kind of thing.

Oxide and Friends

Predictions 2025

2830.496

I think three years is long enough that we can learn to use these things and broadcast that knowledge out effectively to the point that the kinds of reporters who are doing like investigative reporting will be able to confidently use this stuff without any of that fear and doubt over, is it appropriate to use it in this way?

Oxide and Friends

Predictions 2025

2847.483

So yeah, this is my sort of optimistic version of we're actually going to know how to use these tools properly, and we're going to be able to use them to take on interesting and notable projects.

Oxide and Friends

Predictions 2025

2884.707

And on top of that, if you want to do that kind of thing, you need to be able to do data analysis. Today, you still kind of need most of a computer science degree to be a data analyst. That goes away. Like LLMs are so good at helping build out, like they can write SQL queries for you that actually make sense. You know, they can do all of that kind of stuff.

Oxide and Friends

Predictions 2025

2900.857

So I think the level of technical ability of non-programmers goes up. And as a result, they can take on problems where normally you'd have had to tap a programmer on the shoulder and get them to come and collaborate with you.

Oxide and Friends

Predictions 2025

2923.183

It's not so much dystopian, but I think we're going to get privacy legislation with teeth in the next three years. Not from the federal government, because I don't expect that government to pass any laws at all, you know, but... But like California, states like California, things like that, because the privacy side of the stuff gets so dark so quickly.

Oxide and Friends

Predictions 2025

2943.375

The fact that we've now got universal facial recognition and all of this kind of stuff. And I feel like the legislation there needs to be on the way this stuff is used. In fact, the AI industry itself needs this because the greatest fear people have in working with these things right now is it's going to train the model on my data.

Oxide and Friends

Predictions 2025

2963.05

And it doesn't matter what you put in your terms and conditions saying we will train a model on your data. Nobody believes them. The only thing I think that gets that, I think that's where you need legislation even say we are following California bill X, Y, Z. And as a result, we will not be training on your data. At that point, maybe people start trusting it.

Oxide and Friends

Predictions 2025

2979.357

And so if I was in a position to do so, I'd be lobbying on behalf of the AI companies for stricter rules on how the privacy stuff works just to help win that trust back.

Oxide and Friends

Predictions 2025

3.612

I'm very good, thanks.

Oxide and Friends

Predictions 2025

38.658

Yeah, I've never done predictions before. This is going to be interesting.

Oxide and Friends

Predictions 2025

3867.949

It's not that bad. The lowest hanging fruit of podcast search is you subscribe to all of them, you run all of them through Whisper to get transcripts, you make the transcripts searchable. Presumably, people have started building those things already. It feels like you're sat there waiting for someone to do it.

Oxide and Friends

Predictions 2025

3972.654

I'm going to check in another... I'm going to check in a pricing observation. Again, Google Gemini 1.5 Flash 8B. These things all have the worst names. I transcribed... I used it just a straight-up transcription of an eight-minute-long audio clip, and it cost 0.08 cents. So less than 10%. Less than 10% to process eight minutes.

Oxide and Friends

Predictions 2025

3993.022

And, like, that was just a transcription, but I could absolutely ask questions about, you know, give me the tags of the things they were talking about. The... Analyzing podcasts or audio is now so inexpensive.

Oxide and Friends

Predictions 2025

4061.634

Right. Give me a debate between credible professionals talking about subject X exploring these things. You can't do with full text search, but you can do with weird vibe based search.

Oxide and Friends

Predictions 2025

4108.278

I'll join you on that prediction. I'd be shocked if in three years' time we didn't have some form of really well-built...

Oxide and Friends

Predictions 2025

4140.54

I've got to put a shout out to Google's AI overviews for the most hilariously awful making shit up implementation I've ever seen. The other day I was talking to somebody about the plan for Half Moon Bay to have a gondola from Half Moon Bay over Highway 92 to the Caltrain station and they searched Google for Half Moon Bay gondola and it told them in the AI overview that it existed.

Oxide and Friends

Predictions 2025

4163.252

And it doesn't exist. It summarized the story about the plan and turned that into, yes, Half Moon Bay has a gullible system running from Crystal Springs Reservoir. Wow.

Oxide and Friends

Predictions 2025

4261.131

Honestly, it feels like all of the technology is aligned right now that you could build a really good version of this. And that means inevitably several people are going to try. So we'll see which one bubbles to the top.

Oxide and Friends

Predictions 2025

4294.625

I've got a utopian one and a dystopian one here. So utopian, I'm going to go with the art is going to be amazing. And this is basically generative. I have not seen a single piece of generative art, really, that's been actually interesting. So far, it's been mostly garbage, right?

Oxide and Friends

Predictions 2025

4313.335

But I feel like six years is long enough for the genuinely creative people to get over their initial hesitation of using this thing, to poke at it, for it to improve to the point that you can actually guide them. The problem with prompt-driven art right now is that it's rolling the dice and Lord only knows what you've got, what you'll get. You don't get much control over it.

Oxide and Friends

Predictions 2025

4331.107

And the example I want to use here is the movie Everything Everywhere All at Once, which did not use AI stuff at all, but the VFX team on that were five people. So I believe some of them were just like following YouTube tutorials, like incredibly talented five, but they pulled off a movie which it won like most of the Oscars that year. You know, that movie is so creative.

Oxide and Friends

Predictions 2025

4353.478

It was done on a shoestring budget. The VFX were just five people. Imagine what a team like that could do with the... versions of movie and image generation tools that we'll have in six years' time. I think we're going to see unbelievably wonderful TV and movies made by much smaller teams, much lower budgets, incredible creativity, and that I'm really excited about.

Oxide and Friends

Predictions 2025

4396.724

I think teams who have a very strong creative vision will have the tools that will let them achieve that vision without spending much money, which matters a lot right now because the entire film industry appears to be still completely collapsing. Netflix destroyed their business model, they've not figured out the new thing, everyone in Hollywood is out of work. It's all diabolical at the moment.

Oxide and Friends

Predictions 2025

4418.149

But maybe the dot-com crash back in the 2000s led to a whole bunch of great companies that sort of rose out of the ashes. I'd love to see that happening in the entertainment industry. I'd love to see a new wave of incredibly high-quality, independent film and cinema enabled by a new wave of tools. And I think the tools we have today are not those tools at all.

Oxide and Friends

Predictions 2025

4437.938

But I feel like six years is long enough for us to figure out the tools that actually do let that happen.

Oxide and Friends

Predictions 2025

4454.012

And I'll do the prediction. The prediction is the film that a film will win an Oscar in that year. And that film will have used generative AI tools as part of the production process. And it won't even be a big deal at all. It'll almost be expected. Like nobody will be surprised that a film where one of the tools that it used were based on generative AI was, was an Oscar winner.

Oxide and Friends

Predictions 2025

4481.288

Okay, I'm going to go straight up Butlerian jihad, right? So all of the dream of these big AI labs, the genuine dream really is AGI. They all talk about it. They all seem to be true believers. I absolutely cannot imagine a world in which

Oxide and Friends

Predictions 2025

4498.584

basically all forms of like knowledge work and large amounts of manual work and stuff as well are replaced by automations where the economy functions and people are happy. That just doesn't, I don't see the path to it. Like Sam Altman talks about UBI. This country can't even do universal healthcare. The idea of pulling off UBI in the next six years is a terrible joke.

Oxide and Friends

Predictions 2025

4519.43

So if we assume that these people managed to build these artificial superintelligence that can do anything that a human worker could do, that seems horrific to me. And I think that's full-blown butlerian jihad, like set all of the computers on fire and go back to working without them.

Oxide and Friends

Predictions 2025

4548.955

These are parallel universes. I don't think anyone's making, nobody's making amazing art when nobody's got a job anymore. There was an amazing, there was a post on Blue Sky the other day where somebody said, what trillion dollar problem is AI trying to solve? It's wages. They're trying to use it to solve having to pay people wages. That's the dystopia for me.

Oxide and Friends

Predictions 2025

4571.47

I have no interest in the AI replacing people stuff at all. I'm all about the tools. I love the idea of giving, like the artist example, giving people tools that let them take on more ambitious things and do more stuff. The AGI-ASI thing feels like that's almost dystopia without any further details, you know?

Oxide and Friends

Predictions 2025

4601.391

I mean, I'm personally not really, no. But you asked me to predict six years in advance. And in this space, the way things are going right now. Who knows, right? So my thing is more that if we achieve AGI and ASI, I think it will go very poorly and everyone will be very, you know, I think there will be massive disruptions. There will be civil unrest.

Oxide and Friends

Predictions 2025

4622.098

I think the world will look pretty, pretty shoddy if we do manage to pull that off.

Oxide and Friends

Predictions 2025

4643.96

I think they might get to AGI there. I wouldn't rule against them managing to make, well, it's $100 billion in revenue, and then they've hit AGI, right? That's their...

Oxide and Friends

Predictions 2025

4667.234

What is funny about AGI is OpenAI's structure as a non-profit is that they've got a non-profit board, and the board's only job is to spot when they've got to AGI and then click a button, which means everyone's investments are now worthless. Yes.

Oxide and Friends

Predictions 2025

4708.472

Sorry, Microsoft. My dystopian prediction is the version of AGI which just means everyone's out of a job. That sucks. So yeah, that's my dystopian version.

Oxide and Friends

Predictions 2025

4798.164

I think in three years' time, I think they are greatly diminished as an influential player in the space. You know, I don't think... It's already happening now, to be honest. Like, six months ago, they were still in the lead. Today... They're in the top sort of four companies, but they don't have that same. They kind of pulled ahead again with the O3 stuff.

Oxide and Friends

Predictions 2025

48.97

I listened to last year's. Oh, interesting. Just to get an idea of how it goes. And I was very pleased to see that the goal is not to be accurate with the prediction.

Oxide and Friends

Predictions 2025

4816.949

But yeah, I don't see them holding on to their position as the leading entity in the whole of this space now.

Oxide and Friends

Predictions 2025

485.241

I'm going to push back at the one slightly, not on the Doomerism. I think the Doomerism's gone, but the AI skepticism, the argument that this whole thing is useless and it's all going to blow over, that's still very strong. Oh.

Oxide and Friends

Predictions 2025

4858.201

It cost, I mean, it'd be kind of interesting if they begin to tell you, because I feel that if we move to... Sam Altman said on the record the other day that they're losing money on the $200 a month plans they've got for O1 Pro. Easy. I don't know if I believe him or not, but that's what he said, you know?

Oxide and Friends

Predictions 2025

4886.542

It gives you unlimited O1, I think, or mostly unlimited O1. It gives you access to O1 Pro. It gives you Sora as well. And I think the indication he was giving was that the people who are paying for it are using it so heavily that they're blowing through the amount of money.

Oxide and Friends

Predictions 2025

4944.135

The way they implemented it, they just gave their members a credit card to go to the cinema with.

Oxide and Friends

Predictions 2025

5031.935

I'll say one more thing about OpenAI. They've lost so much talent. They keep on losing top researchers because if you're a top researcher at AI, a VC will give you $100 million for your own thing. And they seem to have a retention problem. They've lost a lot of the... My favorite fact about Anthropic, the company that they've clawed, they were formed by OpenAI Splinter Group.

Oxide and Friends

Predictions 2025

5056.252

who split off, it turns out, because they tried to get Sam Altman fired a year before that other incident where everyone tried to get Sam Altman fired, and that failed, and so they left and started Anthropic. Like, that seems to be a running pattern for that company now.

Oxide and Friends

Predictions 2025

568.455

I think my absolute favorite thing for the last two weeks was when DeepSeek in China dropped the best available open weights model on Christmas Day without any documentation. And it turns out they'd spent $5.5 million training it, and that was it. It was such a great microphone drop moment for the year.

Oxide and Friends

Predictions 2025

5715.676

I'm going to have to rave about Waymo for a moment because if you're in San Francisco, it is the best tourist attraction in the city is an $11 Waymo ride. It's ultimate living in the future. My wife's parents were visiting and we did the thing where you book a Waymo and don't tell them that it's going to be a Waymo.

Oxide and Friends

Predictions 2025

5736.112

And so you just go, oh, here's our car to take us to lunch and the self-driving car.

Oxide and Friends

Predictions 2025

5751.422

The Waymo moment is you sit in a Waymo and for the first two minutes, you're terrified and you're hyper vision looking at everything. And after about five minutes, you've forgotten. You're just relaxed and enjoying the fact that it's not swearing at people and swerving across lanes and driving incredibly slowly and incredibly safely. Yeah, no, I'm impressed by them.

Oxide and Friends

Predictions 2025

59.717

Reassure. That's right.

Oxide and Friends

Predictions 2025

6115.424

That's a really interesting question. I mean, the big problem here is that what is the financial incentive to release an open model? You know, at the moment, it's all about effectively, like, you can use it to establish yourself as a force within the AI industry, and that's worth blowing some money on, but...

Oxide and Friends

Predictions 2025

6135.049

At what point do people want to get a return on their millions of dollars of training costs that they're using to release these models? Yeah, I don't know. Some of the models are actually real open source licensed now. I think the Microsoft Fi models are MIT licensed. At least some of the Qen models from China are under Apache 2 license.

Oxide and Friends

Predictions 2025

6154.515

So we've actually got real open source licenses being used at least for the weights. The other really interesting thing is the underlying training data. The criticism of these AI models has always been, how can it even pull itself open source if you can't get at the source code, which is the training data? And because the source code is all ripped off, you can't slap an Apache license on that.

Oxide and Friends

Predictions 2025

6175.403

That just doesn't work.

Oxide and Friends

Predictions 2025

6176.744

um there is at least one significant model now where the training data is at least open as in you can download a copy of the training data it includes stuff from the common crawl so it's includes a bunch of copyrighted websites that they've scraped but um but that has but there is at least one model now that has completely transparent licensing um transparent transparency on the training data itself which is it's good you know um

Oxide and Friends

Predictions 2025

6202.345

One of the other things that I've been tracking is, I love this idea of a vegan model, an LLM, which really was trained entirely on openly licensed material, such that all of the holdouts on ethical grounds over the training, which is a position I fully respect. If you're going to look at these things and say, I'm not using them, I don't agree with the ethics of how they were trained,

Oxide and Friends

Predictions 2025

6223.379

That's a perfectly rational decision for you to make. I want those people to be able to use this technology. So actually, one of my potential guesses for the next year was I think we will get to see a vegan model released. Somebody will put out an openly licensed model that was trained entirely on licensed or public domain work. I think when that happens, it will be a complete flop.

Oxide and Friends

Predictions 2025

6246.167

I think what will happen is it won't be as good as the... It'll be notably not as useful. But more importantly, I think a lot of the holdouts will reject it because we've already seen this. People saying, no, it's got GPL code in it. The GPL says that you have to attribute the... There's attribution requirements not being met, which is entirely true. That is, again, a rational position to take.

Oxide and Friends

Predictions 2025

6267.59

But I think that... It's both true and it makes sense to me, but it's also a case of moving the goalposts. So I think what would happen with a vegan model is the people who it was aimed at will find reasons not to use it. And I'm not going to say those are bad reasons, but I think that will happen.

Oxide and Friends

Predictions 2025

6284.278

In the meantime, it's just not going to be very good because it won't know anything about modern culture or anything where it would have had to ripped off a newspaper article to learn about something that happened.

Oxide and Friends

Predictions 2025

6514.232

I'm very sold on that with one sort of edge case. And that's the thing about writing. The most tedious part of learning is learning to write essays. That's the thing that people cheat on. And that's the thing where I don't see how you learn those writing skills without the miserable slog, without the tedium.

Oxide and Friends

Predictions 2025

6532.247

And so that's the one part of education I'm most nervous about is how do people learn the tedious slog of writing when they've got this tempting devil on their shoulder that will just write it for them.

Oxide and Friends

Predictions 2025

6704.588

I will say one thing about LLMs for feedback. They can't do spell checking. I only noticed this recently. Claude, amazing model, it can't spot spelling mistakes. If I ask it for spell checking, it hallucinates words that I didn't misspell, and it misses the words that I did. And it's because of the tokenization, presumably. But that was a bit of a surprise. It's like, it's a language model.

Oxide and Friends

Predictions 2025

6723.353

You would have thought that spelling, spell checking would work. Anything they output is spelled correctly, but they actually have difficulty spelling spelling mistakes, which I thought was interesting.

Oxide and Friends

Predictions 2025

6792.837

I ask it to look for logical inconsistencies or, you know, points that I made and can go back to and that is great for, but it's another one of those things where it's all about the prompting. You have to, it's quite difficult to come up with a really good prompt for the proofreading that it does. I'd love to see more people share their proofreading prompts that work.

Oxide and Friends

Predictions 2025

6820.65

I love that thing. Yeah.

Oxide and Friends

Predictions 2025

6835.794

I'll use that for proofreading. Yeah, I dumped my blog entries into that and I'm like, hey, do a podcast about this. And then you can tell which bits of the message came through. And that's kind of interesting. The other thing that's fun about that is you can give it custom instructions. So I say things like, you're banana slugs. Read this essay.

Oxide and Friends

Predictions 2025

6851.846

And they discuss it from the perspective of banana slugs and how it will affect your society. And they just go all in. And it is pricelessly funny.

Oxide and Friends

Predictions 2025

7015.461

My ultimate utopian version of this is it means that regular human beings can automate things in their lives with computers, which they can't do right now. Blowing that open feels like such an absolute win for our species. And we're most of the way there. We need to figure out what the tools and UIs on top of LLMs look like that let regular human beings automate things in their lives.

Oxide and Friends

Predictions 2025

7037.626

We're going to crack that, and it's going to be fantastic.

Oxide and Friends

Predictions 2025

7065.202

I use FFmpeg. I use FFmpeg multiple times. Oh, God, yeah.

Oxide and Friends

Predictions 2025

7312.662

Excellent. Thanks for having me. This has been really fun. All right. Thanks, everyone. Happy New Year.

Oxide and Friends

Predictions 2025

801.67

Absolutely. My original idea was going to go utopian and dystopian. And it turns out I'm just too optimistic. I had trouble coming up with dystopian things that sounded like they'd be more than just sort of blank sci-fi. But for the one year one, I've just got a really easy one. I think this whole idea of AI agents, I think is going to be a complete flop.

Oxide and Friends

Predictions 2025

823.087

Lots of people will lose their shirts on it. I don't think agents are going to happen. Yes, again, they didn't happen last year. I don't think they're going to happen this year either.

Oxide and Friends

Predictions 2025

842.708

I will start with... So my usual disclaimer, my thing about agents, I hate the term because whenever somebody says they're building agents or they like agents or they're excited about agents and then you ask them, oh, what's an agent? They give you a slightly different definition from everyone else.

Oxide and Friends

Predictions 2025

856.329

But everyone is convinced that their definition is the one true definition that everyone else understands already. So it's a completely information free term. If you tell me I'm building agents, I am no more informed than I was beforehand, you know.

Oxide and Friends

Predictions 2025

873.623

In order to dismiss agents, I do need to define them, say which particular variety of agent I'm talking about. I'm talking about the idea of this assistant that does things on your behalf. I call this the travel agent version. Oh, God.

Oxide and Friends

Predictions 2025

894.918

Oh, God, they do, and it's such a terrible use case. I don't love that. It's a terrible use case. Yeah. So basically the idea, it's basically, it's the digital personal assistant kind of idea. And it's her, right? It's the movie her. It's the movie her. It totally is. Everyone assumes that they really want this. And lots of people do want this.

Oxide and Friends

Predictions 2025

914.092

The problem is, and I always bang this drum, it comes back down to security and gullibility and reliability. Yes. If you have a personal assistant, they need to be reliable enough that you can give them something to do and they won't go and read a webpage that tells them to transfer your bank details to some Russian attacker and drain your bank account. And we can't build that.

Oxide and Friends

Predictions 2025

936.18

We still can't build that.

Oxide and Friends

Predictions 2025

982.092

Right. The best example of this, so Claude, so Anthropic released this thing called Claude Computer Use, which is this wonderful demo a few months ago where you run this Docker container and it fires up X windows and now Claude can click on things and you can tell it what to do and it can use the operations. It was a delight to play around with.

Appearances

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends

Oxide and Friends