Hello listeners.Today, I have an incredible follow episode from our friends at Turso. You may remember our episode with Glauber Costa in Season 8, where he told us the creation story of the platform. Today, I'm speaking with his co-founder, Pekka, to hear the update on Turso and what the team has been building over the past year.Now with Turso, you can not only have embedded replicas on your device or browser, with multi-tenancy and syncing to Turbo's edge network - but now the tool powers vector search from on the device itself, leading to natively server less, low latency sql lite production loads. Turso continues to push the envelope with their product, and expanding use cases for developers.If you would like to learn more about Turso, go to turso.tech. If you'd like to learn more specifically about vector search, go to turso.tech/vector.SponsorsSpeakeasyLinkshttps://turso.tech/https://turso.tech/vectorhttps://codestory.co/podcast/bonus-glauber-costa-turso/https://codestory.co/podcast/bonus-dor-laor-scylladb/Our Sponsors:* Check out Vanta and use my code CODESTORY for a great deal: https://www.vanta.comSupport this podcast at — https://redcircle.com/code-story/donationsAdvertising Inquiries: https://redcircle.com/brandsPrivacy & Opt-Out: https://redcircle.com/privacy
This episode is sponsored by Speakeasy. Grow your API user adoption and improve engineering velocity with friction-free integration experiences. With Speakeasy's platform, you can now automatically generate SDKs in 10 languages and Terraform providers in minutes. Visit speakeasy.com slash codestory and generate your first SDK for free. This message is sponsored by QA Wolf.
QA Wolf gets engineering teams to 80% automated end-to-end test coverage and helps them ship five times faster by reducing QA cycles from hours to minutes. With over 100 five-star reviews on G2 and customer testimonials from SalesLoft, Grotta, and Autotrader, you're in good hands. Join the Wolf Pack at qawolf.com. Hello, listeners.
Today, I have an incredible follow-up episode from our friends at Terso. You may remember our episode with Glauber Costa in Season 8, where he told us the creation story of the platform. Today, I'm speaking with his co-founder, Pekka, to hear the update on Terso and what the team has been building over the past year. Have a listen.
Well, today I have another special guest on the Code Story podcast, Pekka Enberg of Terso. Pekka, thank you for being on the show today. Thank you.
Thank you for having me.
Absolutely. You know, recently we had your partner Glauber on the podcast to tell us about the creation, the inception story of Terso. But we're going to dive into a bit of an update of the things you've been working on. But before we do, tell me a little bit about you.
My story is tied with Glover's as well. We both worked on Linux kernel. It feels like it was also like yesterday, but more than a decade ago, we were working with the Linux kernel. That's when we met. We joined a company to do an operating systems product, which then pivoted to something completely different, a database product. So me and Glover worked at ScyllaDB.
I think you've had their founder here as a guest as well. I did do real backend programming also before switching to working on databases themselves. So Java was a hot call technology at the time. That's the general introduction I usually give.
Remind my audience what Terso is. So you gave the high level of your founder of Terso and bringing more lows to SQLite. But tell me and the audience a bit of a reminder of what Terso really is.
The tagline is SQLite for production. And basically what we're trying to do is bring the capabilities to SQLite. And for the audience that doesn't know what SQLite is, SQLite is an embedded database. It basically runs already everywhere on embedded systems. Like the claim is that it's the most deployed database on planet, which is probably true.
But if you look at more modern workloads, people doing serverless, even on mobile to some extent, there are things that are just missing from SQLite. And actually, I don't know how much you went through this with Glover on the Inception story, but Durso actually was a company doing something completely different in the beginning.
And we were heavy users of SQLite for a local development toolkit, essentially. And we wanted to do something like that. The thing was to build something like a managed service, essentially. And we were always thinking that we're going to get a proper SQL database to do it in production.
We quickly discovered that SQLite itself is pretty agile, worked really well, but it was just missing some features like replication, which we needed at the time. So that's the thing that we do. We built those features to SQLite to make it really awesome for modern production workloads.
Tell me about some of the maybe the success stories of the customers. What are they doing with Terso and how have they found success with bringing that into production?
So actually really early on, one of the things which we focused on is this ability to bring the database close to, essentially close to the user. People doing web apps where latency is essential for great customer experience. that is something that really resonates with people.
And if you combine it with something like Cloudflare Workers, for example, you can actually cut down a lot of the sort of overall latency, just have a better response time, essentially. But we also, something that we did, which is maybe slightly counterintuitive, SQLite is an in-process database.
So it's a library that you put into your application and now the database is in the same sort of, it's in your application. But we also added a mode where you can access SQLite remotely. So turning it into something similar to Postgres. But the thing with SQLite is it's so lightweight that you can get lots and lots of databases.
So one interesting thing that people are using Tools for is essentially having a database per tenant. So database per user, for example, architecture, which is great for SaaS applications and things like that.
Okay, so tell me the update. We talked about a few things before recording the episode, but I'm curious about what has gone on with Terso. What have you built? What have you shipped to the world? And what has been the big changes with the product?
Two things to update on. And maybe the first one is actually, let's say, more boring incremental part. Because actually, when we first started to work on Turso, as I mentioned, like this edge part was a key thing for us. So even if it's infrastructure, you can't really be that agile in infrastructure space.
But you can still find ways to do proof of concepts and really aggressively validate and work with. Around the time Glover was on this podcast, we had just announced something called Embedded Replicas, and we were actually working on the multi-tenancy thing that I mentioned. And those were actually just things that we already had on the roadmap since after the first six months.
And it's just basically, once you start to work on It's a different way of using the database. Like you can have the database inside your application, but you can still have replication. So you can have durability on basically offloading to the cloud, but also this multi-tenancy thing and all the sort of schema migration and all that stuff. And it is still ongoing work.
So there's lots and lots of work to do to get that into sort of production shape. So that's the sort of more boring incremental part. But the really interesting thing that we recently did, recently released, and it's actually not GA yet, is basically bringing a capability, a new capability to SQLite. So vector search. Probably people are already getting a little bit bored with all the...
LLM and AI stuff. But basically, that's super interesting thing that happened over the past 18 months, like you have a completely new type of workload, new types of applications, which need this capability. And that's something that I personally find super exciting, because this was the first time we really had to dig deep and change core part of SQLite. Very cool.
Okay, so I want to dig into the boring part before I get into the not boring part with the vector search. So from an engineering standpoint, I couldn't imagine multi-tenancy and all the things you're working on or worked on there was boring. But tell me why that's important. Why is it important for multi-tenancy for your customers?
There are many benefits to, as I mentioned, the database per tenant model. So if you zoom out a little bit, what typically people have to do when they're building whatever application, you take a big database and you start with that.
All your user data is in the same database, all the whatever product catalogs, if you're doing an online store or things like that, they're all shared in with this one database. And then problems pop up. As you start to scale, you need to start thinking about sharding and all of those things. But maybe more importantly, it's really hard to keep the data isolated.
So that's one of the things why multi-tenancy story essentially resonates so well, because now you can have these databases which are isolated from each other. So just imagine managing your user data or even some more confidential data. And it really extends all the way from backend to the mobile as well, right?
So you can imagine having your application data essentially sharded per user and having that replicated in the mobile device, for example. So it's partly about scaling, but it's also about privacy and data isolation.
This episode is sponsored by Speakeasy. Whether you're growing the user adoption of your public API or streamlining internal development, SDKs can turn the chore of API integration into effortless implementation. Unburden your API users from guessing their way around your API while keeping your team focused on your product.
Shorten the time to live integration and provide a delightful experience for your customers. With Speakeasy's platform, you can now automatically generate up-to-date, robust, idiomatic SDKs in 10 languages and Terraform providers in just a matter of minutes. SDKs are feature-rich with type safety, auto-retries, and pagination.
Everything you need to give your API the developer experience it deserves. Deliver a premium API experience without the premium price tag. Visit speakeasy.com slash codestory to get started and generate your first SDK for free. This message is sponsored by SnapTrade. Link end-user brokerage accounts and build world-class investing experiences with SnapTrade's unified brokerage API.
With over $12 billion in connected assets and over 300,000 connected accounts, SnapTrade's API quality and developer experience are second to none. SnapTrade is SOC 2 certified and uses industry-leading security practices. Developers can use the company's official client SDKs to build investing experiences in minutes without the limitations of traditional aggregators.
Get started for free today by visiting snaptrade.com slash codestory. Certainly. Okay, that makes sense. That gives me a good idea there of what people would use that for. Okay, so tell me about vector search. So people are, you know, are getting, you know, bored, sure, with the LLM stuff, because it's all in the buzz and things, but it's super useful. And it's really valuable.
Give me some use cases of the vector search. What drove you to that?
When there was this first wave of ChatGPT 3.5, I think that was the sort of, at least for me, the sort of turning point. Then all of a sudden we were in this situation that everybody wanted to apply these large language models to their applications. And basically the models themselves are super useful, but there's this problem called hallucination because they just make up stuff.
So these large language models essentially... are limited to whatever they saw during training. And these things get trained by reading essentially through the whole internet. But there's always a cutoff date, right? So you train it and then after that, it doesn't really know about the new things that appear. But also for enterprises, these models don't really know your company specific data.
information. And that's why people came up with this retrieval augmented generation, which is essentially just retrieving data for the model. And this is where the vector search part comes in. Imagine an interface where you have a customer typing a question. So the way it essentially works is that you take that question, you run it through a
a large language model, generate an embedding, which is a vector. And then you use this vector or this embedding to find relevant information. And that relevant information is through vector search, which is managed in some database. For me, the really interesting thing is that initially what happened was that there was this like explosion of different special purpose databases, vector databases.
At some point they were embedding databases, but then I think the everybody's converged on vector databases. And these are special purpose thing to do just retrieval part. But quickly people also discovered that, hey, we still have this traditional data that we want to access, but also lots of different databases and data sources. So like, how can we simplify this thing?
And then you had a lot of Postgres was adding this extension and so forth. But with SQLite, what is really interesting is because it is such a lightweight thing and you can run it in mobile devices, for example, when you actually bring this vector capability to SQLite, you can do all of this model work. related processing and all of that searching within the device itself.
And you can imagine you have the latency advantage, but also increasingly people are super interested in the sort of privacy aspect, right? Because now you can have the private information on the device. It doesn't necessarily have to leave the device. So I think that's the cool part in using the old traditional SQL database and then vector search.
Yeah, wow. So that's really interesting. You can pull data essentially from within the client itself, whether it be mobile device, and we're talking about mobile devices, eventually, maybe it could be browsers or something like that. But a client, you don't have to go to a server itself to do the heavy computing. How did you figure that out? Because that's not a small thing.
Usually, you're reliant on the power of the device, right? So if you're using a weak device, it's not going to work very well. Tell me about that.
zooming out a little bit and going back to the large language model. So like for us, we actually initially didn't do anything. When we got to the first vector databases out, we decided, okay, we don't really understand this space. We're just going to wait out and see what happens in the market. But then
Six months later, you could see a Postgres community, for example, really stepping up and doing this. And then we started thinking, this kind of becomes like an existential thing. It's one workload, but it still becomes like a super important workload. So what can we do? And then we basically just started exploring how we could implement it.
And actually, it wasn't probably us first pointing out the mobile aspect. It was just that We're going to do this feature. And then through our design partners, people were like, hey, this is perfect. Like I can use this for my LLM powered application and all that stuff. But it also, you could see the trend of large language models, basically splitting into two different directions.
You have the really large model. So Lama 3.0 something just got released. And it's like, it's one of the biggest open source models out there. But you also have the smaller one, which actually fit on devices. We could see Apple, for example, doing some research in that area. I don't know if they roll it out.
But basically, you could see that if you have a powerful retrieval augmented generation, so you have this ability to search for data, then you can probably get pretty far with the simpler model. So for us, it started to make sense that, okay, this is something that probably is useful in the mobile space.
No doubt. It's a really smart way to solve that problem, right, of the local replica, the being able to vector search on the device while still, you know, or eventually syncing, right, to your edge network, to your remote databases. This message is sponsored by QA Wolf. If slow QA processes bottleneck your software engineering team and you're releasing slower because of it, you need a solution.
You need QA Wolf. QA Wolf gets engineering teams to 80% automated end-to-end test coverage and helps them ship five times faster by reducing QA cycles from hours to minutes. With over 100 five-star reviews on G2 and customer testimonials from SalesLoft, Drada, Autotrader, and many more, you're in good hands. Ready to ship faster with fewer bugs?
Join the Wolfpack at QAwolf.com to see if they can help you squash the QA bottleneck. This message is sponsored by SnapTrade. Link end-user brokerage accounts and build world-class investing experiences with SnapTrade's unified brokerage API. With over $12 billion in connected assets and over 300,000 connected accounts, SnapTrade's API quality and developer experience are second to none.
SnapTrade is SOC 2 certified and uses industry-leading security practices. Developers can use the company's official client SDKs to build investing experiences in minutes without the limitations of traditional aggregators. Get started for free today by visiting snaptrade.com slash codestory. So then, so you've got Vector Search, right? You've already have an amazing SQLite product.
Now you have Vector Search. You're fueling this local replica process. Where do you take this next? This is already an incredibly powerful platform, but where do you take this next? Where do you see this going?
There's basically two main things. mobile space in terms of, you basically have two platforms, right? So you have iOS and Android, but then you also have things like React Native and all that stuff. So it's actually pretty fragmented. So one of the main things we're focusing on is basically making sure that the
our SDKs for those platforms are top notch and things that you can really just get the developer experience. Because like for us, it's always this combination of trying to get the best possible developer experience, but combine that with robust infrastructure. So that's one thing which is actually surprisingly big investment.
You really have to go and do the work for every fragment of the ecosystem separately. So that's one. The other part is basically, as you also hinted towards, even if you do this stuff on mobile or in the browser, you want to offload to the cloud. And a big part of what we build is the stuff that basically runs on the cloud and getting all of that right and doing the scaling there.
Desegregated storage is something that is super interesting. And basically, we're keeping SQLite in the client, but we're also doing this server-side SQLite in a sense, which is also a big part of what we do.
Awesome. I appreciate that. I want to dive into the SDK portion because you're right. It's super fragmented. There's all kinds of things that you would need to support. How do you choose what is the most important SDK to go make sure is perfect? And how do you go about deciding if you're going to add new ones?
That's a great question. We struggled with that as well a little bit. So as I mentioned, two main platforms, and then you have React Native, you have Flutter, and other things as well. But I think those are the four main ones to consider.
For us, actually starting with React Native just turned out to be pretty natural because our JavaScript SDK for the server side thing is something that is by far the most popular one. And unsurprisingly, JavaScript is such a huge ecosystem. That is the main one we want to tackle first because there's this upside, of course, that you get this portability between the platforms.
But I think down the road, the reality of things is that you anyway, like it really depends on the types of applications. But if you look at the big ones, they will go native. Then it just becomes React Native. We can do start with that. But then you probably have to just do Android and iOS both. I just probably know like you can't really make a decision with going to with one of them.
Then the rest of it, Flutter and frameworks like that, I think for us, it's just going to be demand-based. We're going to see if we have enough people basically wanting to do something beyond those three.
Makes sense, totally makes sense. And it is a struggle. There's so much out there. Okay, so I'm a developer. I'm building this really cool product and I wanna be able to use SQLite. I wanna be able to use something like Terso where I can remotely update my databases, but on the device itself, I want to be able to do the vector search. I wanna be able to have the embedded replica.
I wanna essentially be able to process data in a fast way on my local device. How do I get started using Terso? What do I need to do?
So right now, unfortunately, it's not production grade yet. So you just come to our Discord and ask for the beta for React Native client essentially. That's basically just the starting point. The way we're integrating is just trying to find existing open source ecosystem libraries, for example, and integrating that. So the one we use today is a library called op-sqlite.
And basically, it will have Turso support out of the box. And basically, then you sign up to our service and get your databases on the cloud. And off you go. Because once you have a managed database on the cloud, then it's just a configuration thing on the client to connect to your remotely managed database, and it will do all the sync and all those things.
Fantastic. Well, Pekka, I really appreciate you being on the show today and giving the update on Terso. done some really amazing things with the things that you're calling, quote unquote, boring with multi-tenancy and embedded replicas.
But the vector search capabilities that are on device and being able to power that fast data retrieval and remote syncing with your edge network is really, really fantastic and fascinating. Really appreciate you being on the show, giving the update. Cool. Thank you for having me. Incredible.
Now with Terso, you can not only have embedded replicas on your device with multi-tenancy and syncing to Terso's edge network, but now the tool powers vector search from the device itself, leading to natively serverless, low-latency SQLite production loads. Terso continues to push the envelope forward with their product and expanding use cases for development.
If you'd like to learn more about Terso, go to terso.tech. If you'd like to learn more specifically about Vector Search, go to terso.tech slash vector or sign up for their Discord. And thanks again for listening.