The Changelog: Software Development, Open Source
Reinventing Kafka on object storage (Interview)
Thu, 29 Aug 2024
Ryan Worl, Co-founder and CTO at WarpStream, joins us to talk about the world of Kafka and data streaming and how WarpStream redesigned the idea of Kafka to run in modern cloud environments directly on top of object storage. Last year they posted a blog titled, "Kafka is dead, long live Kafka" that hit the top of Hacker News to put WarpStream on the map. We get the backstory on Kafka and why it's so widely used, who created it and for what purpose, and the behind the scenes on all things WarpStream.
Thank you.
What's up, friends? Welcome back. This is The Change Law. We feature the hackers, the leaders, and those who are building data streaming platforms inspired by Kafka. Yes, today's conversation revolves around Kafka and data streaming. We're joined by Ryan Worrell, co-founder and CTO at WarpStream. Last year, they posted a blog titled Kafka is Dead, Long Live Kafka.
And that, of course, hit the top of hacker news and put WarpStream on the map. Today, we get the backstory on why Kafka is so widely used and who created it and for what purpose, and more importantly, the story of WarpStream.
And the question they asked themselves was this, what would Kafka look like if it was redesigned from the ground up today to run in modern cloud environments directly on top of object storage with no local disks to manage, but still had to support the existing Kafka protocol? Well, that's just the premise for today's conversation. A massive thank you to our friends and partners over at Fly.io.
More than 3 million apps have launched on Fly, and we're one of them. Scalable full stack without the cortisol. No stress. Learn more at Fly.io. Okay, let's Kafka. What's up, friends? I'm here with a new friend of mine, Sagar Bachu, co-founder and CEO at Speakeasy.
You know, I've had the pleasure of meeting several people behind the scenes at Speakeasy, and I'm very impressed with what they're doing to help teams to create idiomatic SDKs, enterprise-grade SDKs in nine languages, and it's just awesome. So, Sagar, walk me through the process of how Speakeasy helps teams to create enterprise-grade idiomatic SDKs at scale.
You know, APIs are tough things to manage. For a company, the OpenAPI spec, this great standard, widely adopted standard to describe and document SDKs, is the best chance the company has towards documenting it, understanding point in time what is the API, and also ownership. What are the APIs? How are they grouped? Which teams are on them? What services do they get deployed to?
There's a lot of questions there that often we see teams and companies kind of struggling to answer. So speakeasy is a forcing function for them to invest in making that open API spec as great as possible. Completely descriptive, fully enriched. Speakeasy helps with those gaps. We have deterministic and AI tools to kind of fill in the gaps for them.
And so the better and better that OpenAPI spec gets, the better chance you have at serving your community. The end value is always to the end user who is actually integrating with your API. So if your open API has gaps in it, more likely they are into errors. They don't understand what they're implementing.
It gets tough to maintain because it becomes institutionalized knowledge as opposed to described on the document. So there's a lot of great reasons why you invest in that open API spec. Any artifact like that that you're going to invest a ton of time into needs tooling to manage. And that's what speakeasy is at its core.
It's tooling to manage that open API spec, give you kind of very clear change management principles around it, version it, understand exactly what versions are used for what SDKs. If you invest in that spec and use speakeasy, you'll have a good document. The moment you have a good document, you could have good or great SDKs, which make integration easy.
The way SpeakEasy works there is you point us at your document, wherever it lives, in GitHub or maybe some other file storage or somewhere else. We detect changes as it evolves, as different people contribute to it, and we send you new updated code every time that happens. The moment we send you code, there's an opportunity for you to review that and say, you know, yes or no.
Like this is new code we want to ship to our customer. We do that heavy lifting of generating that code, giving you kind of provenance of its spec. But leave you as human in the loop to decide, okay, am I going to serve my ecosystem with a new version of the spec in SDK? So that's the kind of core workflow that we're built around.
And that's really the point of collaboration between us and companies that we work with.
Okay, friends, the next step is to go to speakeasy.com. Try today for free. You get one free SDK in your language of choice on them. Enjoy it. Robust, idiomatic SDKs in nine plus languages. Your API, the open API spec available everywhere. Again, go to speakeasy.com. Once again, speakeasy.com.
All right, today we are joined by Ryan Worrell from WarpStream. Ryan, welcome to The Change Log.
Thanks, it's great to be here.
Great to have you. Shout out to listener Vladimir for requesting this episode. Also, shout out to your co-founder, Richard, who unfortunately couldn't be here today, but hey, Richard. What's up, Richard? Yeah, but you're here, so let's talk to you and not to Vladimir and to Richard. That being said, Vladimir requested this episode. You too, listener, can request episodes.
Head to changelog.fm slash request. Let us know what you would like to hear about on the pod, and we might just fulfill your every desire. Vlad wanted to hear about WarpStream, and so that's why Ryan is here. Just so happens that Adam and I both would also like to hear about WarpStream.
Yeah.
Here we are. Let's start with Kafka, though, because it sounds like WarpStream's story starts with Kafka's story. Okay. What is Kafka besides a author from the early 1900s? But the open source thing, what is that thing all about?
Yeah, Kafka is both a very interesting and a very boring system. The easiest way to think about it is it lets you create topics and you can have producers that write messages into these topics and consumers that consume messages out of the topics. It's kind of like a publish and subscribe type deal.
But the thing that makes it interesting is the fact that once you consume those messages, they're not deleted. So they're still stored inside the system and another consumer can go and read them again for a different purpose. Like if you have two different applications that are consuming the same data set, they can both equally consume those messages.
Let's say that you have one application that does machine learning training and another that does alerting based on the two different, like the same messages you want to process them, but you want to process them in different applications. Kafka is a useful tool for that.
It also provides ordering for those messages so that if you need to implement an application where you send messages in a certain order and you want that order to be retained on the other side, Kafka also does that for you. Each message is assigned a unique offset within a partition of that topic, which is kind of like a shard.
And within that shard, if you process the messages in the same order again, or if you process the messages in that partition again, you'll get them back in the same order every time. So you can implement something like state machine replication or that type of thing where the ordering matters.
Okay, so what are some use cases for this? Lots of, it sounds like, well-funded companies use it, larger companies. Yeah. And I think that some of that is because of the operational complexities and the love-hate relationship with it. But why are people grabbing this particular tool often?
Yeah, the reason why it's useful is there just isn't a lot out there that fulfills those, you know, the two main things. It's like a publish and subscribe mechanism that's scalable, right? And then also, that lets you have different consumers process the same set of messages without one of the consumers deleting it.
There's a lot of queuing systems that the messages, when you consume them once, they're just gone forever at that point. The purpose is to consume the message and then have it go away, not to reprocess it again in the future. There are a lot of use cases for it. I'd say that the most broadly popular one is for moving data from point A to point B, kind of like a dump pipe.
It's used a lot in observability and security-related workloads, where you have a lot of application servers that are generating logs, and you want to temporarily put those logs somewhere before you put them in something else, like you say you want to put them in Elasticsearch or something like that. Elasticsearch can be a little finicky.
So you want to have Kafka, which is a much simpler system in place, as a temporary buffer to hold those long messages that you want to write to Elasticsearch in case that Elasticsearch cluster is down or you're doing an upgrade or something like that. There's a lot of different reasons for it, but Kafka is pretty much the de facto standard for those kind of workloads.
And then when you get outside of observability and security, there's a lot of people that are building custom applications on top of Kafka, like an inventory management system for a warehouse where every time you want to keep track of the real-time status of everything going on in the warehouse, you might want to send
messages to say, oh, this new batch of inventory has been added onto the shelves of the warehouse. I'm taking things out. And then you're computing some type of a live application based on that inventory data to say, you know that you need to replenish the stock when it goes below a certain amount. But you want to do that in real time so that you can react faster than just doing this once a day.
So Vladimir pointed us to a post which I think Adam and I, we had both already read this post because it was last year. Last summer, I believe. Kafka is dead. Long live Kafka. This was your big coming out party, it seems. A great way to introduce WarpStream. And in that post... You said that Kafka is one of the most polarizing technologies in the data space.
And then whether it was you or Richard who wrote that, then you just moved on and kept going, assuming that we all just knew why or how or like agreed that that was just true. I assume it's true. It's probably polarizing, but why is it polarizing? My guess is because it's useful, but difficult to use. And so people love it and hate it, but maybe there's more to it than that.
So I think that there are probably two main criticisms that people have of Kafka. The first is that it's hard to run. Like as the operator, you have to have a lot of knowledge about how to run. use the open source project appropriately. And the second major issue is the cost.
I'm sure we'll get into this, but the cost of running open source Kafka in the cloud, it's pretty high compared to what people expect it to be. If you think of it as a dump pipe, you would expect to pay dumb pipe type rates for it.
But given the fact that it requires triply replicating the data onto local disks, and you'd have to pay most of the cloud providers are charging you money for interzone replication, you end up paying a lot more than you expect, even if you're just storing the data temporarily.
If you're using open source Kafka in AWS, for example, the minimum cost for a highly available 3AZ setup for the cluster is 5.3 cents per compressed gigabyte written into the cluster. That's just to do the replication part. The storage part is all another story. It depends on how long you want to store the data for. You're like, if you're starting out, that's your baseline cost.
It can get pretty expensive pretty quickly.
Is anyone building or using Kafka, open source Kafka, as you said, in a scenario where they're not on public cloud, where they're building out their own infrastructure, where it's probably maybe even more harder because you're literally managing the disks. You're not ordering the disks or SRAing the disks. You're literally managing the disks. Is that a scenario that happens or is it less likely?
That's definitely a thing that happens. I know of companies that do that, but just as the migration to public cloud over the last 10 years has only increased in velocity, essentially, that is becoming less and less commonplace. popular, because it is indeed hard.
And it's even harder when it's in your own data center, as opposed to the cloud, where you can just ask for more disks, and you get them right away. The cost situation is a little different there, too, because typically, the way that you're provisioning network in your own data center would not end up with a per-gigabyte cost.
You amortize everything over how much data you're transferring inside your data center, but you're buying it in terms of hardware and your per gigabyte rate if your traffic goes up, it doesn't correlate the same way linearly as it does with Amazon. But it's definitely still a thing people do, but it's less and less popular every day.
Continue with the polarizing. What else is polarizing about Kafka?
Some people have strong opinions about the actual developer programming model of Kafka and that it's a little hard to use sometimes. I think that's less of a big deal these days as more tools have integrated with Kafka. It makes it even easier to use Kafka than there are some other systems that might have a theoretically easier to use programming model. But everything speaks Kafka now.
So those concerns are mostly trumped by the fact that it's the de facto standard. I think really what most people are concerned about when, like if you don't use Kafka today and you're thinking about bringing it in to your company, the two things that you're going to be concerned about are how hard is it to run and how much is it going to cost? Those are typically concerns.
people's two big blockers. It doesn't have anything to do with the fact that conceptually they have an issue with Kafka. It's those more practical things.
What makes it so difficult to run? Is it the SSDs? I think that post also called it finicky. Is it poorly architected? Like, why is it finicky?
It's a number of different things. I think the first one is yes, being responsible for anything that stores data on local disks, if you want to achieve high availability and high durability of your data, is challenging. It requires experienced SREs to, like... handle those types of failures when they do occur.
But that, I think, can be dealt with because people do that with other systems all the time. But I think that most people's problems with Kafka come when they want to scale up and scale down the cluster in response to load. The open source project doesn't really give you much tooling when it comes to helping you manage that process.
Like, for example, in the open source project, there's no automated tool to rebalance the data among the machines when you add or remove machines. That's kind of a table stakes feature in a lot of... If you're thinking about a distributed relational database, that would seem kind of silly if you had to run a script to move data between the nodes and the database.
But that is true of open source Kafka. And there are now. There are other tools that you can use alongside of it that can take some of this work off of you. But they're not always the easiest to use either. It's not like a self-balancing, self-managing thing like a lot of the distributed relational databases are. It's something that takes a little bit more hands-on work.
And another thing that goes along with that is if you're storing data for a long period of time, in the open source project. They didn't add a tiered storage feature until very recently in the open source project.
And the time that it takes just to copy the data around from machine to machine when you're scaling up or scaling down the cluster can be hours or days, depending on how dense you're running the machines. Some of that is alleviated with the new tiered storage stuff where the older data is moved to object storage, but that part doesn't alleviate the inner AZ networking costs.
And there's another post on our blog about tiered storage and Kafka if people are interested in learning more about that topic.
It is open source though, right?
Apache Kafka?
Yeah.
Yeah. The project is managed by the Apache Foundation and has a variety of contributors across a ton of companies. And I would say it's a fairly healthy example of an open source product in terms of like having a big community.
There's a margin of haters, let's just say, towards Kafka. And it is open source. And I'm just curious, you know, you may be in that bucket of margin of haters because you've created WarpStream, right? So you're kind of not for, you're kind of against, at least from an economic standpoint and maybe a DX standpoint and many other standpoints. The point I'm getting to is why not just improve Kafka?
So there are a lot of practical challenges with improving a large open source project with a lot of users and a lot of dependent parties, I should say. Not even necessarily just users, but stakeholders of all kinds. Making large sweeping changes is essentially impossible. It's not. The amount of code churn required to...
take open source Kafka and get it to something resembling the architecture of Workstream is just not going to, that's not going to happen in any reasonable amount of time. That's the first part. If you just wanted to abstractly, no financial interests involved, how would you do this? It would be very hard, practically.
The second reason is that WarpStream makes a pretty different set of trade-offs than the open source project does in terms of the environment that we expect users to run in. Now, I think those trade-offs are correct for the world that exists today, but in the abstract, it is different than the open source project. So WarpStream stores data only in object storage. That's step one.
You need an environment that has object storage. And then step two is that we run a control plane for the cluster, which in the open source, the comparison would be kind of like if somebody was running Zookeeper or Kraft, which is their replacement for Zookeeper inside of the open source project.
It's kind of as if we're running that for you remotely, and then you're running the agents, as we call them, which is the replacement for the Kafka broker. inside your cloud account. So just like there's a very specific topology that we're prescribing to our customers as well. That's different.
Probably wouldn't fly in an open source environment, or at least would make it even more challenging to run potentially. I think those are probably the two biggest reasons of why we couldn't just improve Kafka is just it would be too hard practically to make improvements. And then also we're
We're making trade-offs around what we think the world, like how we see the world existing today and how we think it's going to continue to exist in the future that a lot of the stakeholders to the OpenService product may not agree with our assessment there, basically.
Good answer. I was expecting a version of that. I was not suggesting that you should just not start WarpStream and by all means just go contribute to Kafka and bail. But it's always good to get that perspective because Kafka's got history. It's 13-ish years old. It was developed inside of LinkedIn for different purposes. That's why I started off with the question, which was...
Their own infrastructure because LinkedIn designed this for a different purpose than everybody else today uses it. It's not it was not designed to be used in a cloud environment where there's a lot of egress fees and a lot of fees between moving data around. And so it was not really designed for its actual use case or the usage space that it's in.
And LinkedIn did not charge its users those transaction fees. I assume potentially because, and I don't know LinkedIn's infrastructure history, but I assume because they had far more control over their cloud or their own environment to not have to deal with those costs than maybe everyone else who's become a Kafka user has had to take on.
Yeah, the way that I like to explain that, the networking cost side, is that when you're renting space in a colo or you have your own data center, you're implicitly paying for what is kind of a fixed capacity resource. It has a very high fixed capacity, but you are essentially paying for a resource that has a fixed capacity without doing a bunch of capital improvements to your data center.
Whereas if you're in the public cloud, you can show up and put a credit card down and start moving gigabytes a second across the network without asking anybody for permission, nothing. So you're paying kind of a tax for that flexibility of being able to show up without asking anybody, all of a sudden start moving a ton of data.
And especially in terms of how spiky you can do it, like you can write 100 gigabytes a second for one minute never pay Amazon any money again. They have to do some capacity planning on their end, just like they do for every other service and why they charge higher on-demand rates for EC2 instances than if you go and buy a random server off the internet and put it in your house.
The cost looks very different. Now, whether that cost is right, whether that reflects real economic realities, I don't think anybody can say without being inside of Amazon, but I think there's a pretty logical rationale for why it exists that way, because there are people that will consume bandwidth in a very different way.
You have to think about the worst case scenario users, basically, of your service, the people that you might even call it abusers of your service in terms of your cost profile. So I think that's why, as you're saying, you're correct that LinkedIn can just decide to use Kafka in a different way internally to match their ability to provision infrastructure.
And Amazon can't really force you to do that in any way other than just charging you more money for it. So that's why they do.
So you and Richard, did you guys meet at Datadog? Is that where you guys connected or was he at Datadog? Tell us a little bit of the history of you two.
Yeah. So Richie and I met a little over five years ago now at a conference. We met at Percona Live. I think it was 2019 in Austin. Okay. And he was working at Uber at the time. Okay. And yeah, so we did eventually both end up joining Datadog, but that was a little later.
Gotcha. Yeah. And while you were there, you had put some sort of Datadog infrastructure on S3 or on object storage. Husky, I think. I'm going from memory now.
Yeah, so my co-founder, Richie, and I, after he left Uber, we started working on a prototype of a system that was... The idea was basically a snowflake for observability data. That was like the elevator pitch. And we were going around pitching that to investors at the time, and that's how we got to know some of our investors in Warframe today is we met them back in those days. And...
That eventually caught Datadog's attention. And we ended up joining Datadog together to build that system. Husky with some of our current colleagues at Workstream were also there at Datadog building that system with us. Basically, the idea there was to replace the legacy system inside of Datadog for a lot of the kind of
basically anything that you can think of that's not pre-aggregated time series metrics. The idea was to think of it as timestamp plus JSON. That was the data model, basically. And we wanted to move all that data to object storage for a ton of different reasons for it, similar to the reasons why WarpStream is useful.
Yeah, over the three and a half years that my co-founder and I were there, we migrated all of the products that were using the legacy system over to ASCII.
Yeah, I mean, that's why I ask about it, because it seems like it's a precursor to this very similar move with Kafka, right? Like, what if we took Kafka, ripped out the local storage aspect of it? Sounds easy enough. And built something, I mean, by ripped out, conceptually ripping out, right? You didn't fork Kafka and write this, right? You started over.
Yeah, we started from scratch and writing it in Go.
Right, so conceptually, rip it out, but actually rewrite something that's Kafka compatible in terms of features and API, I assume, and all that kind of stuff. But no local storage, object storage.
And your success with what happened to Datadog probably led the way for you to say, well, if we did that, it would be a lot cheaper, basically, and way easier to operate because, hello, Amazon Web Services, right? Like, it's their problem now.
Yeah, there's definitely a lot of high-level conceptual overlap. The systems are extremely different, because one looks more like an OLAP database, and the other is, I mean, Kafka is more like a log. So there's some... very high-level conceptual similarity. And I think the thing that we really got the most experience with there was learning about object storage.
So that's about where the similarities stop is just the deep experience of understanding how object storage works at scale in all of the major public clouds was a hugely valuable learning experience for us to know that when we left and we were doing the back-of-the-envelope math could we make this thing work that experience less?
The experience with object storage that we learned there was pretty helpful. Now, I think a lot of object storage, people talk a lot about object storage nowadays. So I think that's not an unknown thing to understand the characteristics of working with it nowadays. But I'd say in 2019, that was a fairly different story.
I think the only people that would know a lot about building high-performance systems on top of object storage, they were probably all either inside the public cloud providers themselves, or they were working at Snowflake or a similar company. The knowledge was not super well distributed at that time. Most people, when they think of object storage, they think of something that's super slow.
They're thinking about it in terms of seconds of latency to do anything. And they just think you have to rework your... The numbers around it are very different than what people might think of off the top of their head. And that opens up a lot of design possibilities that you don't think of immediately.
Okay, friends, here are the top 10 launches from Supabase's launch week number 12. Read all the details about this launch at supabase.com slash launch week. Okay, here we go. Number 10, Snaplet is now open source. The company Snaplet is shutting down, but their source code is open.
They're releasing three tools under the MIT license for copying data, seeding databases, and taking database snapshots. Number nine, you can use PG Replicate to copy data, full table copies, and CDC from Postgres to any other data system. Today it supports BigQuery, DuckDB, and MotherDuck with more syncs to be added in the future.
Number eight, Vect2PG, a new CLI utility for migrating data for vector databases to SuperBase or any Postgres instance with PG Vector. You can use it today with Pinecone and QDrant. More will be added in the future. Number seven, the official Subbase extension for VS Code and GitHub Copilot is here. And it's here to make your development with Subbase and VS Code even more delightful.
Number six, official Python support is here. As Supabase has grown, the AI and ML community have just blown up Supabase, and many of these folks are Pythonistas. So Python support expands. Number five, they released log drains so you can export logs generated by your Supabase products to external destinations like Datadog or custom endpoints.
Number four, authorization for real-time broadcast and presence is now public beta. You can now convert a real-time channel into an authorized channel using RLS policies in two steps. Number three, bring your own Auth0, Cognito, or Firebase.
This is actually a few different announcements, support for third-party auth providers, phone-based multi-factor authentication, that's SMS and WhatsApp, and new auth hooks for SMS and email. Number two, build Postgres wrappers with Wasm. They released support for Wasm, WebAssembly, Foreign Data Wrapper. With this feature, anyone can create an FDW and share it with the Supabase community.
You can build Postgres interfaces to anything on the internet. And number one, Postgres.new. Yes, Postgres.new is an in-browser Postgres with an AI interface. With Postgres.new, you can instantly spin up an unlimited number of Postgres databases that run directly in your browser and soon deploy them to S3. Okay, one more thing. There is now an entire book written about Supabase.
David Lorenz spent a year working on this book, and it's awesome. Level up your Supabase skills and support David and purchase the book. Links are in the show notes. That's it. Superbase launch week number 12 was massive. So much to cover. I hope you enjoyed it.
Go to superbase.com slash launch week to get all the details on this launch or go to superbase.com slash changelogpod for one month of Superbase Pro for free. That's S-U-P-A-B-A-S-E dot com slash changelogpod. What are some lesser known things about object stores that you know that we don't know? Or maybe nobody knows besides you.
Yeah, it's not really one secret trick. I think it's just a conceptual framing that you have to think of it as if you had access to a very large oversubscribed array of spinning disks. If you think about it like that, then the conceptual framing of how it works will make, like how you design a system around it will make a lot more sense. So there's a couple different pieces of that.
Really large, like way bigger than your individual application. So like you have the world's biggest RAID 0 of all the disks ever. It's actually unlimited. So think about it that way. But also oversubscribed. The latency characteristics of it are highly variable. One request might take 10 milliseconds, and the other takes 50. And there's no discernible reason to you why that is the case.
It's just that is how it works. So you have to design around that a little bit in terms of retrying requests speculatively and that type of thing. But if you have that framing of it's very large, cheap storage with variable latency characteristics, if you rework your application to think about how it would make it work on top of that, then you've got the right framing.
The reason why it's so challenging for people today is that they spend all their time thinking about the fastest storage that's available today. They spend a lot of time thinking about persistent memory or NVMe SSDs, stuff like that. They think about that first when they're designing their application. How do I get the lowest possible latency?
Making your application work on that first and then trying to add object storage on top is a very popular thing that people try to do. They always call it tiered storage. Basically, every system that has that calls it tiered storage. And it's very hard to match the characteristics of those two things together going top down.
Whereas going bottom up the other direction, starting with object storage and then layering stuff on top, it seems like it should be the same, but it's not. You don't end up making the same design decisions along the way. And that has a big influence on the overall characteristics of the system. And I can explain specifically what that means for Kafka in terms of tiered storage.
So they were thinking about disks first, like local NVMe SSDs. That's usually what people are running on these days in the cloud. The way that that influences the design is that the way that they implement tiered storage is they just take those log files on disk that have all the records in them, and they copied them over to object storage. That solves a cost problem.
If you never want to read that data again, you're good. That's cool. It's much cheaper now. When you want to come back and read it, let's say that you wanted to read all of it, like all of the data you've ever tiered off into storage, the way that that works in the open source project is that you'll end up reading all of that data you're going to have to pull back through one of the brokers.
There's no way for you to parallelize that processing because they just view it as this bunch of log files that I put into object storage. And with Orbstream, we've kind of decoupled the idea of the local storage being owned by one machine to now there's a metadata layer that says, these are all the files that exist.
And then we have all these stateless agent things that can actually pull the data out of object storage for you. So you can scale up and down. as quickly as you need to to read all that data out of object storage. So you wanted to pull it all out. You can scale up temporarily for the hour that you want to run some big batch job and then scale back down at the end.
With the open source tiered storage in Kafka, that's a lot harder because they started with the local disk part, which makes sense because that's what existed before. It just means that adding stuff on afterwards, you're usually the tiered storage, lower layers of storage is like a secondary concern. It doesn't get as much love and attention
as the primary storage gets, and you end up with a very different system at the end.
For us laymen, can you describe how the brokers work and contrast that again with these stateless agents? I understand that you can scale the agents horizontally because they are stateless versus a broker, which seems to have kind of a lock on some data. But what do Kafka brokers do exactly?
Yeah. So Kafka has, let's start with topics. Topics are basically just a name for mapping consumers and producers together. They agree on the name of a topic for how they're going to where they're going to send the data to and where they're going to consume the data from. And within a topic, there are partitions. And a partition is basically just a shard to make that topic scalable.
There are a lot of different ways to decide which shard you're going to write the data to. But let's just say, for now, you do it by hashing the key of the message and then routing it to the shard based on the hash of that key. So if you have the record with the same key, you'll end up going to that same broker every time that owns that partition. So that's how it works in the open source product.
The brokers own some set of partitions from a leadership perspective. And then there's also replicas of that that are just copying the data. And it's just other brokers that are the replicas for those partitions. So the broker will write that data that it receives from a producer client down to the local disk and replicate it out to the followers. And then
a consumer can come along and read either from a replica or the leader the data that producer wrote. But they're all coordinating on essentially one of those brokers owns the partition specifically that I'm interested in and reading. So that's how it works in the open source product and in Warp stream, we've decoupled the idea of ownership of a partition from the broker itself.
We have a metadata store that runs inside our control plane that has a mapping of, here are all the files and object storage. And within those files, the data for this partition for this offset is here. It's in some section of a file in object storage.
So any of our agents, which are like the stateless broker that speaks the Kafka protocol to your clients, any one of those agents can consult the metadata store and ask, I want to read this topic partition at offset X. Where do I have to go in object storage and potentially multiple places in object storage? Where do I have to go in object storage to read that data?
But because the metadata store inside the control plane is handling the ordering aspect of it, essentially, you get the same guarantees as Kafka in terms of I have this message with this key that's routed to this topic partition, and I want them to stay in the same order because I'm writing them in a specific order. That ordering part is enforced by the metadata store inside the control plane.
But the data plane part of actually moving all of those messages around is only inside the agents and object storage. So it lets you do that thing that I was saying before, where if you want to scale up and down, it's very easy to do that because you don't have to rebalance those partitions, which take up space on the local disk amongst the brokers in order to facilitate that.
So you're reading metadata versus reading the real data, basically. And that's what makes it faster.
In terms of being faster, it's faster at the fact that there is no rebalancing that happens. Because the data is always just in object storage somewhere. You don't have to do any rebalancing for it. That part of it is faster. There's obviously a trade-off when you do this in that the latency of writing to object storage is higher than writing to the local disk.
So if you want your data to be durable, you have to wait for the data to be written to object storage first. So that's the primary trade-off somebody that's using Warpstream would be making is that they're comfortable with around 500 milliseconds at the P99 of latency to write data to the system.
And then the end-to-end latency of like a producer sends data and then it's consumed by a consumer is somewhere between one to one and a half seconds again at the P99.
What percentage of the Kafka population does that cut out? Because it seems like many of them are highly real-time oriented.
So it's interesting that you use that word real-time because we've talked to a ton of different Kafka users. And when you ask them, what is your end-to-end latency of your system today? A lot of them don't know the answer. They think that they know the answer. Well, it's real-time. Yeah, they're either not measuring it, where they're measuring it in a weird and incorrect way.
There's a lot of different ways that that can happen. But typically, the way that we've experienced is that if you ask an executive at the company that uses Kafka heavily, ask them, is your application latency sensitive? They'll say, of course. We're an extremely high performance organization. We love high performance systems.
Obviously, the intent latency couldn't be anything more than 50 milliseconds. That would be crazy if it were anything more than that. And then you make it a little bit further down the chain in the organization. You ask the application developer or the SRE who's actually on call for the thing or wrote the code. You ask them and they're like, I don't know.
I hope that it's fast, but I'm not really sure. Or you ask them and you get an explicit answer that's very different than the answer that the executive gave you. Yeah. Realistically, there are a few applications that we come across that do need that low latency.
And the primary example of that, I mean, there's a lot of this kind of application out there in different domains, but the good example that demonstrates it is credit card fraud detection. The way that... There are people out in the real world using credit cards, and you want to make a determination about whether a chart is fraudulent at the point of time that they're swiping the card.
So that is necessarily a real-time thing. There's a user who's waiting out in the real world. And if Kafka is in the critical path, especially multiple hops through Kafka in the critical path, then a system that has higher latency, like WarpStream, would be harder to adopt. And there are other applications that meet this criteria.
But basically, if the user is in the critical path of the request, then WarpStream is harder to adopt in the abstract. Obviously, some specific applications might be OK with higher latency than others, but that's the one that we see from time to time. When you strip all those out, though, the things that you have left are the more analytical type applications.
Like the example I was talking about before, moving application logs around. Developers are pretty used to some delay between the log print statement running inside their application and being searchable inside wherever they're consuming their logs from. So the additional one second of latency there is typically a non-issue.
And the reason why that's useful for us as a company at Workstream is that those workloads are typically really high volume and they cost the user a lot of money. So our solution being more cost effective really resonates with them because usually there's also a curve of, The more data you're generating, the less valuable that data is per byte.
So there's like budget pressure to get the efficiency to process that data. You want to increase the efficiency of processing that data and Kafka sticks out like a sore thumb in terms of that. processing cost.
So we can come in and say, hey, because of the way the cloud providers don't charge you for bandwidth between VMs and object storage, and we store all the data in object storage, that means you're going to save this many hundreds of thousands of dollars a year on sending the dumb application logs that you're generating into the eventual downstream storage, that makes a lot of sense to them.
So while we understand that we can't hit every possible application in the market with the shape that Workstream is today, we're pretty happy with the set of use cases and workloads that we can target because there are just so many of them out there and they happen to align with the budget-sensitive ones.
Those reads and writes, can you restate those? Did you say writes are at most in P99 500 milliseconds and reads are one to two seconds in P99? Is that correct?
So the writes are around 500 milliseconds at the P99. That's tunable. By default, we have the agent buffer set.
the records that your clients are sending in memory for 250 milliseconds before writing them to object storage, so that you just write fewer files to object storage, which is the primary determinant of the cost of the object storage component of the system, if you're not retaining the data for very long.
But you can shrink that down all the way to 50 milliseconds, in which case then 10 latency, or sorry, the produced latency at that point would be probably ballpark 300 milliseconds at the P99.
For, I said end-to-end instead of read, because that's typically what people talk about in Kafka terms, because they wanna know like a producer sends a message, how long does it take until a consumer can consume that message successfully? So that's what I mean by end-to-end, and that is one to one and a half seconds of the P99 for most our users.
So latency aside, what are the other downsides of this approach?
So there really aren't that many downsides other than the latency. The latency is what actually enables all of the benefits of WarpStream, basically. The object storage is what enables a lot of the benefits. We have a couple of interesting features that are based on the fact that all of the data is in object storage. One of them we call agent groups.
And Azure Groups let you take one logical cluster and split it up physically amongst a bunch of different domains. They could be like different VPCs within the same cloud account. It could be different cloud accounts. They could be different cloud accounts or same cloud account, but across regions. all by just sharing the IAM role for the object storage bucket between those different accounts.
The alternative to this with open source Kafka is like setting up something crazy like VPC peering, which is extremely hard to do. And your security team will probably not be super happy if you try to ask them to peer a bunch of VPCs together because it introduces more security risks.
So we have customers in production using this feature today, where the example that we usually give is there's a games company that splits their production games account, where all the game servers run, from the analytics account, where they do like the, so they run a bunch of flank jobs to process the data generated from the production account.
And they run agents that just do produce, so just writes. They run that in the production account. And they run agents that just do fetch inside their analytics account. So they've kind of flexed the cluster across those two different environments. And all they had to do to set that up was share the IAM role on the object storage bucket instead of peering the VPCs together.
So the fact that everything is in object storage opens up a ton of new possibilities, actually. Basically, the only downside of WarpStream is the fact that the latency is higher. Now, obviously, we're a new company. The product does not have the 13-year maturity of the open-source Kafka project. But just to speak of the operational...
stuff and the cost stuff, the Workstream is a huge win on both of those.
Does it have any of the hosting flexibility? I suppose you're putting everything in object storage anyway, so there's probably people running their own object storage clusters, but that might be crazy. I don't know.
Yeah, so there are a number of projects and products out there that you can buy to give you an object storage interface in essentially any environment. Like there's the open source project, MinIO, and then basically every storage vendor on the market will sell you something with an S3 compatible interface if you're running in a data center environment.
And because we work with S3, GCS, and Azure Blob Storage, we can essentially, you know, I shouldn't say connect to anything. If you had an NFS server, we can even make it work on that too. We don't have any production doing that, and I wouldn't recommend it. I would recommend using the object storage interfaces, but we're pretty flexible in terms of the deployment topology.
What about R2? Would you have even more savings, or would that not matter because nothing's going outbound from the virtual network there?
So I think it would depend on where you're running the compute. If you were storing the data in R2, but you were running compute in AWS, you would get charged a lot of internet transfer as part of that. If you're running your compute in one of the providers that has free peering with R2, then yeah, you would get a nice savings there and you're, you know,
You'd be able to move data reliably across, let's say, multiple regions of whatever providers have peered for free with R2 using Workstream.
I was thinking about getting started, really, or just trying it out. I do like your curl demo. I did play with it. I had no idea what I was doing. But it was cool. The command is on your homepage. It's curl and a URL to an install script. I did not review that script prior to running it. I just trusted you.
You're admitting that to everybody?
Well, you know, it was a VM on Proxmox, so I didn't care that I could just throw away. It wasn't my own machine. I was safe. That's a good layer. It did spin up, and then it gives you a URL you can go to to log in. And next thing you know, you're looking at a cluster. So I like that aspect about it. Whose idea was it to come up with that demo? I mean, it's very hacker. It's very developer.
No pain whatsoever. If you've got a VM or you want to spin up a VM or you have Proxmox, then you can do it safely like I've done. Or you can spin up a droplet on DigitalOcean or pick your own if you've got a VPC, whatever. You could do it in a more safe manner and have some fun. What do you expect people to do with that? What are people saying about that? And whose idea was it to produce that demo?
This is very hacker. I like it.
Yeah, I think the demo was Richie's idea. It basically just starts up a producer and a consumer so that you can just see something happening in the console. Like, yeah, it provides you a link. If you would have run that locally on your laptop, we would have opened the link automatically in your browser for you.
It said it had a problem and I had to click it, so yeah.
Yeah, so we even designed the little niceties like that. But the idea behind the demo is basically just to show people that it does something. Kafka is not an exciting technology to demo, so we're kind of limited there. It's even more boring than doing a demo for a relational database or something. But there is another mode that you can run that's called Playground.
And Playground will let you start a cluster that doesn't have a fake producer and consumer running on it as a demo. It just starts a cluster for you temporarily and makes an account that expires in 24 hours. And you can take that Playground link and you can start...
multiple nodes, like say one on my laptop and one on yours, and point it at R2, and we can have a cluster that spans our two laptops together. Like my co-founder and I did that before and posted a video of it on Twitter or something like that. But because the data is all in object storage and the compute part is stateless, it's actually, it's not that complicated to do.
It's basically the thing we were talking about a second ago with R2, just connecting two laptops instead of two different regions or something like that.
So to get to the Playground version of it, is it like dash dash Playground? How do I get there?
Yeah, so there's three different commands primarily that people would run. There's warp stream demo, there's warp stream playground, and then there's warp stream agent. The agent is like the one you would run for production to start an agent. And the playground one is how you start a playground.
I think the playground even gives you, like it spits out in the output, the command that you would copy and send to somebody else to start it in another terminal. It's been a long time since I've played with it, so I may be remembering wrong. The reason why people like the demo
or I should say the Playground, is that it makes it easy if you're a developer to just start a cluster and use it for local development instead of having to run. If you use WarpStream in production, you want to use the same thing in your development environment just to ensure consistency.
You can use Playground mode to create a cluster, and it will just go away when you stop using it, and there's no cost.
Yeah, I dig it. I kind of wish there was more documentation. If there is, then I would go find it or maybe a video or something like that because that's kind of cool. I like this demo because for those who just want to tinker without having to spin it up in the EC2 or just whatever, you know, go the extra mile.
I love that you can just sort of do this on your own, but I had no idea the playground version was there or the agent version was there to go a little further. And there's some room that you can make some content around that to give people more of a guidance. And you should do that.
Yeah, totally. It's the, the playground has been a lot of people have found the playground and the demo people have found a lot of joy in because they're, they're just cool.
We also have a serverless version of the product that basically just gives you a URL that you can connect to over the internet for us, you know, to fulfill a similar purpose, basically for people, if they want, if they want to try it out without actually doing anything locally on their machine, I think we give a new accounts like $400 of credit when they sign up.
So you can do a lot with that if you just want to play around without actually starting any of the infrastructure.
And I guess while I'm on your homepage perusing, just under this demo that is so cool, there is a mention of plug and play. Part of your angst, I suppose, to get to where you're at was let's rethink what this might look like in a modern time, which is what you've done. But then also to be just swap out. So one thing it says is there's no need to rewrite your application to use a proprietary SDK.
You just literally change a URL. How did you get there in terms of the, it's fine to not want to contribute to Kafka and make your own way. And I'm totally cool with that. and WarpStream reinventing or rethinking this model.
But how do you get to this point where you're like, let's make this as frictionless as possible to focus on the DX of what it might actually be like to say, okay, well, if this is, like Jared said earlier, that subset of folks that maybe they're not doing credit card transactions and fraud detection where that needs to be literally real-time, where the latency cannot be absorbed.
In a scenario where it can be absorbed and it's a large transaction, population of Kafka users to say, listen, we're here and this is how easy it is to swap. How did you get to that design, that idea?
We got there by just talking to people, basically. The number of developers out there who are using Kafka, it's really high. And we talked to a lot of them. And when we asked them, basically, what do you not like about Kafka? They would give us a bunch of different answers. But when we would ask them, if we could fix those problems for you, would you want to do that?
And it would involve essentially rewriting large parts of your application. It's a non-starter for people. And there are a bunch of other things out there in the world that integrate with Kafka, like Spark and Flink. And there's a bazillion open source tools out there that integrate with Kafka. You know, we have no influence on any of those things either, really.
So it was kind of a choice that was forced upon us. There's really no way Kafka has so much momentum behind it that it's pretty much impossible to get broad adoption of something that would be a replacement for it. without having the exact same wire protocol so you can use the exact same clients and stuff like that. It's a lot of work to maintain that compatibility.
Thankfully, a lot of that work is front-loaded. It's just you do it once, and Kafka is not a particularly fast-moving open-source project, so they're not changing the protocol every day. Backwards compatibility is very good with Kafka, so thankfully it was mostly a one-time cost, but it's opened up a lot of opportunities because we are compatible
To even just doing basic stuff for the company, like being able to do co-marketing with other vendors of products that are compatible with Kafka. If we weren't compatible with Kafka, you know, we would be able to do that. And a lot of the open source tools that we could that we would want to integrate with, like, let's see the.
open telemetry collector or vector, these kind of observability agent tools, they all can write data to Kafka and we inherit that benefit right out of the box. So it's been super important for us basically to have that compatibility.
And do you think that, I know you're sort of youngish, but do you think that, I suppose, how are you winning? Are you winning the market? That's what I'm trying to get to is like, are you truly absorbing a lot of the Kafka user base? Is there a major demand for WarpStream? What's the state of product market fit and are you winning?
Yeah, so we have a number of large use cases in production today. I can't talk about very many of them, unfortunately, but there are warp stream clusters out in the world processing multiple gigabytes a second of traffic through, and not just like one of them. Like there's a decent number of them at this point. And where we're having success in the market is basically
The large open source users who are, you know, they feel like the open source product is a bit too challenging for them to run. And there's budget pressure all over the industry today, especially in the, you know, in the corners that we're interested in, like in the observability and security areas. On the analytics side, there's a lot of budget pressure.
So we're a pretty natural fit for those folks who are both tired of running the Opizars project and they're getting budget pressure to decrease their cost. We're having a lot of success there.
What about Greenfield? Is there anybody that's like, okay, we need to adopt Kafka or something like it, but what is out there before we go and write a lot of code or flesh out our infrastructure model or make any plans? What about those that are not migrating? What's the path, I suppose? What's the inbound of those folks? And what's the path to the DX?
Because one of the things you mentioned is that you solve a few problems. You solve cloud economics. You solve operational overhead. And one thing that you mentioned, at least in the article that was from last year, was a major problem with Kafka, which was developer user experience. And that's what I'm trying to get to there. Those were coming on green, brand new.
What is that user experience like? And what is the path like for them?
Yeah, so I think that for Greenfield projects, there's two different branches of those. There's Greenfield products that are only Greenfield in the sense that they're trying to adopt Kafka for some goal. They're not Greenfield like the application didn't exist before. There's that aspect of it where they're just new users of Kafka.
And then there are truly Greenfield projects where the project itself is new and also the choice to choose Kafka is new. And usually those products don't have a super high volume of data. It's the existing initiatives or applications within a company that process a lot of data but are not using Kafka for cost reasons where we are having more success.
There's a product that I would love to talk about that won't quite be public by the time this episode is posted, but they're in that first category where it's a large existing product. workload, but they were not using Kafka for a bunch of different reasons, cost being one of them.
And they're now a big Workstream customer because they saw that there are benefits to using Kafka for their application, but they just couldn't use the open source project for cost reasons. And now essentially they can. There's a lot of cool stuff that they can do now that they couldn't do before that Kafka enabled them to do.
And WarpStream is their Kafka-compatible product of choice for those cost reasons. And they're starting to get some benefits from it now.
So I guess the obvious question to me at this point is Kafka is not dead. It is alive. It is open source. To my knowledge, I don't think it is. WarpStream is not open source. Was there a conversation about licensing? Was there a conversation about being a commercial open source company?
Just to follow in the footsteps of the predecessors that you at least from a conceptual standpoint copied and improved upon, right? You were led by here. You stood on the shoulder of giants. Where are you at with that? What have you thought about in terms of licensing and open source and what's your stance on open source as your core or not?
Yeah. So we had a lot of back and forth initially when we were thinking about this specific issue. The conclusion that we came to is that in order to be successful commercially, we cannot release our product as open source. And we did not want to pull the kind of bait and switch intellectual dishonesty move of the way a lot of commercial open source projects have evolved in the last decade.
five years in terms of either relicensing or changing the focus of the project drastically to benefit the primary commercial backer. And we just didn't think that it was, we're providing a lot of value by providing a solution that is dramatically lower cost and also compatible with the existing ecosystem.
And the way that that works in practice means that you can switch away from WarpStream because you're not locked into it from an application perspective or a protocol perspective. So we're not locking you into something proprietary from an interface perspective.
So it's actually relatively easy to switch away from WarpStream if you decided to in the future because you didn't like something that we did. But we're hopeful that the fact that we provide something that's dramatically lower cost and easier to use means that you won't switch away.
And you'll continue to have the best of both worlds, so to speak, where there is an open source thing out there that obviously is going to continue to exist because it has a ton of users. But if you want to use our product to save money and have something easier to use, you can as well. And we will be able to continue to invest in making that product better and better over time because we are not
We're not stuck in these kind of middle of the road outcome issues that a lot of commercial open source companies have where they're forced a few years down the line to cash in all of their brand goodwill on a relicense in order to gain that commercial success that they wanted. We're hoping to be able to.
By sticking to this model, we're hopeful that we'll be able to be a good citizen of the Kafka ecosystem in terms of making a product that's not incompatible and proprietary and steering everybody away. And we do put a lot of effort into testing clients. We find bugs in Kafka clients that are typically open source and make improvements there.
But the core part of the product is not going to be open source.
What's interesting about those re-licenses is that they all were commercially successful companies, even at the time of the re-license. They had arrived, and at a certain size and scale, it seems that the growth curve has to continue to go vertical to satisfy investors, to satisfy public demand in the case of Hashi.
But I'm sure, I actually don't know the state of Redis Labs or the commercial success or not of Redis, but many of them were large, successful commercial companies, bigger than most companies ever get before they actually went ahead and did that not cool rug pull.
But I wonder if the pressures on them, because it's other people's money, similar in your situation, like you have a vent VC behind you. And I'm just curious about that decision from your guys' perspective. Because you're a small team, probably well-funded in terms of you guys are highly successful software people, so you're probably making good money.
Run way well into the... the next decade for the sites. Yeah, so why not bootstrap?
Why not bootstrap and then not have any of that VC pressure that you currently have?
That's a really good question. And I think that the... Take a step back from that question for a second, talking about the commercial open source stuff. This is obviously a little bit inside baseball, but as a part of going through that decision process, we talked to the founders of a lot of commercial open source companies. And we asked them,
let's say you were starting our company today, what would you do? And without hesitation, the answer we got was, I would not start it as a commercial open source company today. And there are a lot of different reasons that they gave for that. And I can't really give some of those reasons without potentially identifying who those people are. And I don't want to do that.
But the challenges of a commercial open source company today with the, it's not even just the hyperscaler cloud providers anymore taking your stuff and running it. That's obviously a concern, but you can get around that with, like the AGPL does a decent job of preventing some flavors of that.
The other issue is just like the competition within the category that they're building their product in is extremely high. And having your source code out there in the wild and letting everybody know your secrets essentially about how you made your product better.
the you you lose a lot of the juice behind why you have these huge staffs of developers working on interesting things it's not to say you can't protect that otherwise either but like with software patents and stuff like that but people don't the appetite for software patents it would do a lot of brand reputation i think if
companies created a lot of software, if these commercial open source companies created a bunch of software patents and started enforcing them against each other, for example. It's a very challenging situation today. A lot of the companies that you might view as successful commercial open source projects, they might be successful in the iteration that they exist in today.
or yesterday, in the case of the holidays licenses, where they have good adoption in the developer community, and they might have good success in the VC-funded startup segment of the world. But there is an inevitable push to go upmarket and to go after larger and larger customers because it's effectively the only way to support growth.
The growth of what you can achieve within the small... If your customers are all small startups, even medium-sized startups, and developers playing around in their personal capacity or... Stuff like that. The revenue opportunity is just really small, unfortunately, for a lot of these businesses.
It's much easier to sell a million-dollar-a-year contract to an enterprise than it is to get a million dollars of revenue out of a bunch of small and medium-sized businesses. So the temptation when the growth starts to slow down is I need to go do that now. Like that's the first thing your investors are gonna tell you is you need to go out market and get enterprise customers.
If the product that you're selling them is support or a couple of features on top of an open source project, your ability to exert pricing pressure on that enterprise buyer to get them to pay a higher price or to get them to pay at all
In the case of a lot of these open source projects where they spent so much time making it good that the enterprise can just hire one person to maintain it internally and just move on with their life and run the open source forever and maybe pay you a peanuts support contract, essentially, not actually enough to support the business. It's just really hard.
I completely understand where you're coming from and that it might have felt as if these companies were successful from the outside. And some of them definitely were. But just there is that inevitable pressure to keep the growth rate up. And the only way to do that is to go up market. And when you're going up market, you need to provide something that looks valuable.
And if your project is open source and the alternative is hiring a developer or two to maintain it internally, you kind of have a cap on how much you can charge. And it's the same thing if you're offering a a cloud version of an open source project, for example.
The premium someone will pay for your cloud version, it may be lower than you expect if they can self-host, because they're always looking at that. They're looking at both sides of the coin. How much will it cost me to self-host this versus how much does it cost to use your cloud hosted version? And that calculus does not always come out in your favor as a vendor.
And you may want to charge, you may have to charge significantly more. to make the numbers work on your side than what they think they can run it for internally. It's really challenging stuff, and we wanted to provide the best product possible with the best product experience possible.
And we didn't feel like the shape of an open source, commercial open source company was the right way to do it without having a lot of these distractions about the things that I'm talking about right now come up along the way. And we didn't feel like it would be right to do that, the bait and switch thing that people are doing these days. We wanted to be honest, basically, from day one.
That makes sense to some degree. I don't fully agree with all of your sentiment, although that's a very deep and lengthy conversation teetering on just not fitting this conversation necessarily.
But what I can appreciate, given that I don't fully agree with all of your reasons, the one reason I think that you've done well or I suppose the most positive thing is you've made it easy to get in and get out.
So if for some reason warp stream is of great benefit, and let's just say a year down the road somebody does warp not stream, and it's commercially open source, and they eat your lunch because they decided to be open source first, and they can get into that just as easily as they can get out of you, then that's a whole different story. I'm not suggesting that's going to happen, but it's...
Possible.
It's totally possible. Yeah. And you're exactly right. If one of our competitors came up with a better implementation tomorrow and it was...
The exact implementation. They can literally copy everything you do and just – the world would be okay with that because they made it open source. That's a version or at least a subset of a conversation we had at length on this podcast a few weeks back with JJ, Joseph Jaxx. He was like, yeah, I'm totally cool with –
Somebody, a founder going out there and literally copying X and saying this is now X as open source. He was totally cool with that. I'm not saying that makes sense completely to me too, but the world now believes that's an okay thing. And it's an okay thing because at the core it is meant to be an open source commons good.
Yeah. I would have no, I would not harbor any ill will towards someone who decided to do that.
I would be like, come on, man, don't do that. Well, someone's going to do it. I mean, as you guys have success now, whether or not they can actually pull it off is the question, right? But like there will be at some point as Washington continues to grow, uh, a Hacker News number one story. X is like WarpStream, only it's open source and self-hosted. And it'll get 500 to 1,000.
And maybe it gets adoption, maybe it doesn't. Maybe by then you guys are so far ahead it doesn't matter. There's tons of what-ifs, but it will happen from somewhere in the world if you're successful.
And the reason why it doesn't bother me so much, basically, is the portion of the Kafka market, Let's say because we have commercial competitors, obviously, the portion of the Kafka market that has been commercialized, let's say somebody is paying for a licensing fee or some of their fee to use the product, not just hiring somebody to run it for them.
The portion of that market that's been commercialized is very small. So there is so much greenfield market out there for us to commercialize, along with this constant, ever-increasing trend of things becoming more real-time. And these other tailwinds of more observability and security data being generated in the world, there's just...
This market is just going to be so big in the future that I think it's unlikely to have a winner-takes-all dynamic similar to the way that there are multiple large public cloud hyperscalers that exist and are very profitable. And there's just so much of this market out there that we're not super concerned about any particular market.
competitor even if one were open source there's there's a lot of other dimensions that we would hopefully be better at competing on that you don't get out of just the fact that the product is is open source that you know combined with the fact that the market is is so huge that we're we're pretty happy with our our position as it is today
Hey friends, I'm here with Brandon Fu, co-founder and CEO of Paragon. Paragon lets B2B SaaS companies ship native integrations to production in days with more than 130 pre-built connectors or configure own custom integrations. So Brandon, talk to me about the friction developers feel with integrations, SSO, dealing with rate limits, retries, auth, all the things. Yeah.
Yeah, so there's a lot here, and I think there's a lot of aspects to the different problems that you have to solve in the integration story in building these integrations and also providing them in a user-friendly way for your customers to self-serve and onboard and consume those integrations.
So part of what the Paragon SDK provides is that embedded user experience, again, what we call our connect portal. That's going to provide the authentication for your users to connect their accounts. That's going to be the initial onboarding. But in addition to that, your users may also want to configure different options or settings for their integrations.
A common example that we see for Salesforce or for CRM integrations in general is that your users may want to select some type of custom object mapping. Every CRM can be configured differently, so your users might want to map objects to some different type of record in their Salesforce or different fields in their Salesforce.
And typically that's what developers would have to build on their own is this UI for your users to configure these different settings for every single integration.
That's also going to be what's provided by the Paragon SDK is not just that initial onboarding and authentication experience, but also the configuration end user UX for different settings like custom field mapping, selecting which types of features on your integration that your user might want to configure. and that's also going to be provided fully out of the box by the Paragon SDK.
With integrations, different APIs might have different rate limits, they might have different policies that you have to conform with, and your developers typically have to learn these different nuances for every API and write code individually to conform to those different nuances.
With Paragon, because we build and maintain the connector with each of the integrations that we support in our catalog, we're automatically going to handle for things like retries, things like rate limits.
And so we look at this as sort of the backend or infrastructure layer of the integration problem that we have spent the last five years essentially building and optimizing the Paragon infrastructure to act as the integration infrastructure for your application.
Okay. Paragon is built for product management. It's built for engineering. It's built for everybody. Ship hundreds of native integrations into your SaaS application in days. Or build your own custom connector with any API. Learn more at useparagon.com slash changelog. Again, useparagon.com slash changelog. That's U-S-E-P-A-R-A-G-O-N dot com slash changelog.
And I'm also here with Dennis Pilarinos, founder and CEO of Unblocked. Check them out at getunblocked.com. It's for all the hows, whys, and WTFs. Unblocked helps developers to find the answers they need to get their jobs done. So Dennis, you know we speak to developers. Who is Unblocked best for? Who needs to use it?
I think if you are a team that works with a lot of coworkers, if you have like 40, 50, 60, 100, 200, 500 coworkers, engineers, and you're working on a code base that's old and large, I think Unblocked is going to be a tool that you're going to love.
Typically, the way that works is you can try it with one of your side projects, but the best outcomes are when you get comfortable with the security requirements that we have. You connect your source code, you connect a form of documentation, be that Slack or Notion or Confluence. And when you get those two systems together, it will blow your mind.
Actually, every single person that I've seen on board with the product does the same thing. They always ask a question that they're an expert in. They want to get a sense for how good is this thing? So I'm going to ask a question that I know the answer to, and people are generally blown away by the caliber of the response.
And that starts to build a relationship of trust where they're like, no, this thing actually can give me the answer that I'm looking for. And instead of interrupting a coworker or spending 30 minutes in a meeting, I can just ask a question, get the response in a few seconds and reclaim that time.
The next step to get unblocked for you and your team is to go to getunblocked.com. Yourself, your team can now find the answers they need to get their jobs done and not have to bother anyone else on the team, take a meeting, or waste any time whatsoever. Again, getunblocked.com. That's G-E-T-U-N-B-L-O-C-K-E-D.com. And get unblocked.
So let's go back to bootstrapping then. It seems like the kind of thing you could bootstrap. I mean, it's just you and Richie coding it up on nights and weekends, you know, get it rocking and rolling, keep all that equity. No, no one to answer to. You're going to get customers pretty quick. Then you can start hiring based off of your customer. Like why that decision to raise?
So the reason why people raise money is let's only put it for me. The right reason to raise money is that you want to go faster. That's basically why someone should raise venture capital, is they have something that's working and they want it to go faster.
My co-founder and I had so much conviction in what we were doing in terms of it being commercially successful that we knew on day one we would be able to go much faster if we raised money. So that's why we did it. There was never a period of time where we were guessing like, oh, do people need this? It was like very obvious to us from day one that we wanted to go as fast as possible.
And raising money is the way to do that because we were able to hire people a lot you know relative to the two of us many more people and pay them very well and make them happy and support you know make it hiring people that are good at distributed systems stuff is very expensive and the those type of people also really appreciate job security
So being able to have a bunch of cash in the bank, even if we're not spending it, is very important to those folks. So our internal stakeholders, you know, as employees and founders and stuff, it makes it very comfortable to have that cushion and allows us to hire people that will make things go faster. And then on the complete other side of the coin, if you want to sell something,
products to enterprise buyers as two people without having raised any money, it's going to raise a lot of eyebrows if they want to put that in production as the backbone of their multi-billion dollar business.
That makes a lot of sense.
It's really hard. Whereas if we can walk into a meeting and say, hey, we've raised roughly $20 million from Greylock and Amplify Partners, who are our Series A and seed investors, respectively. that sidesteps a lot of really awkward conversations about like, what's gonna happen to you founders if you get like hit by a bus tomorrow or something?
Obviously that'll be very bad for the company, but there is at least somebody else who cares and would like to continue to hopefully see their investment succeed. So the dilution stuff is really, obviously it's a good point, but you just have to think, are the odds of success higher? And will the eventual outcome be bigger if I raise VC? And if that is true, then I think it's worth doing.
But if you're in a position where you don't know if your product is gonna be commercially successful, it closes a lot of doors to raise VC. Like every further round that you raise, it makes it harder and harder to explore different kinds of exit opportunities that you might personally view as a success, but your venture investors may not be as a success.
So it's definitely a balancing act, but you just have to go into it with your eyes open and understand what you're, you have to understand the game you're playing, basically, and walk into it with your eyes open.
Had you played this game before?
Yes. Very briefly, a long time ago, unsuccessfully, I did. Yeah. And in between that and starting Warpstream, my co-founder and I were considering raising money for the thing that we were doing before we joined Datadog. And that's how we got to know our seed investors at Amplify Partners. And we didn't have that conviction at the time to say, let's go raise money. This is going to be huge.
In hindsight, we probably would have done very well with that had we chose to raise VC and remain as an independent thing and all that instead of joining Datadog. But because we didn't have that conviction, we took the quote unquote exit opportunities that were available to us at that moment because we hadn't yet raised money. We're very flexible.
So we were able to join Datadog and it worked out super well. We got to meet a bunch of interesting people and the project we were on was successful and super fun and all that stuff. But because we did have that conviction this time around and we wanted to go as fast as possible, that's why we chose to raise money this time around.
I think your reasons are sound. I don't disagree, and I will not argue. Good answer. I'll give it to you. I will not argue. I think, you know, we check wisdom. While we love open source, I don't think that you would have had – I can see how going the route of venture capital and not going, as you had said, some of the burden of open source in terms of distraction was your actual word –
I can understand that. And that's your prerogative, right? Bobby Brown is dated in terms of an artist.
Nobody knows Bobby Brown anymore.
But it's my prerogative is still a true phrase. Ryan, do you know Bobby Brown?
It's been a long time since I've heard any Bobby Brown, but I do indeed a little bit.
I grew up on Bobby Brown, so I can't help but bring it up. It's my prerogative. Yeah, it's my prerogative. Yeah, great song. You know, so it's your prerogative and it's Richie slash Richard's great name, by the way, Silicon Valley. I mean, I had to bring it. He was called Richie and his name was Richard Hendricks, but he was called Richie by his attorney.
I don't disagree with the reasoning for your direction. I hope it works out for you. I think it, it seems like it's going to, but I do agree with what Jared said, which was, there is probably going to be, if you hit critical mass and enough scale, somebody who copies what you've done and simply just says, okay, literally copy. And now it's open source and they'll be okay with that.
I don't think that you should operate in a state of fear of that and make choices because of it, because that's free market, man, that's going to happen, you know, but, uh, Good on you for being able to answer these hard questions. I think you did well on that front. I don't have any argument, really. That's all I'll say.
And that's only because we spent a lot of time thinking about it and a lot of time talking to folks who are day-to-day building commercial open source businesses that really brought our perspective to where it is today. And it's not to say that there are no possible opportunities to start a commercial open source company that would be successful today. There obviously are.
It's just that for our particular market and the strategy that we were pursuing, it just wasn't going to be, I think I can put it a little bit more crisply. The segment of the market that we're going after is already price and cost sensitive. If we offered them the opportunity to run our product for free, the odds that we will be able to charge them almost any money would be pretty low.
There are other markets out there that have completely different dynamics in this, especially if you're not trying to provide the low cost solution. So I didn't mean to denigrate commercial open source companies. I was just saying that when we explained our strategy, basically, to these other commercial open source founders, they said, that's going to be hard. It's going to be very hard for you.
So you should think about it before you choose to go down that path. And we chose this path because we think it's most likely to be successful for us while also I would be personally very upset if I had to do one of those license change rug pulls. It would make me very sad because I know it causes a lot of consternation and heartburn for people when those things happen.
So we just wanted to be straight up with people from day one.
I also think that you are a particularly easy target for the hyperscalers to reclone and host and offer because of the nature of what you're doing.
Yeah. I mean, it's a, it's a general purpose infrastructure building block and like Amazon has AWS and Amazon has MSK as a competing product with, with warp stream. Um, so they very directly could just, you know, offer a new SKU of MSK, that is the WarpStream one, if it were open source. That would be very challenging for us.
Ride your coattails. Are there other competitors out there? Are there other people that are putting Kafka on object storage?
Yeah, I mean, there are a number of companies out there that have talked about how they're doing this. I think the most notable of them out there would probably be Confluence announcement of their freight product. That's the, you know, probably the splashiest announcement of any of them where they're taking a similar direct to S3 approach as Warpstream does.
And the product isn't available today for anybody to just go sign up for and do a comparison. But they've made an announcement, and I'm sure that's going to progress more in the future. I'm sure essentially every one of our competitors, if they haven't started working on it already, a similar storage engine, they will. So I have no doubts that the cat is out of the bag, so to speak, on the idea.
Yeah, better way. Well, that does make sense then why you went venture capital so that you can go fast. And I think that from a visual standpoint and you've done well from a brand standpoint, I think your marketing site is pretty awesome. I mean, there's obviously always room for improvement, but it's pretty solid.
I do want to bring up the idea of pricing because I don't disagree there either that There's large corporations, enterprises, so to speak, Fortune 500s, that if you're not charging them $10,000, $20,000 a year, they're like, what's wrong with you? We can't use you. We literally need to give you a lot of money to trust you. And that's just the nature of the beast there.
But when you land on your page for pricing out the gate, the TCO, total cost of ownership, is at least the default numbers that are put there, is $2,295 per month. So you're not even scaring people away. I mean, like you're literally putting your fist in their face and saying like, it costs a lot, y'all. But that's the cheap version.
These people are probably used to paying more than that, right, Ryan?
Yeah, I mean, there's a little slider that lets you turn on the breakdown mode of the comparison to open source Kafka running in three AZs or one AZ or comparing to AWS MSK. And we didn't even put a particularly big workload as the default on the pricing calculator. I think it's a pretty standard workload.
And people are used to looking at big numbers when it comes to running Kafka for these kinds of observability and telemetry workloads. They just cost a lot. If you look a little bit further down the pipeline there, if they're sending the data to Elasticsearch or Snowflake or Clickhouse, they're probably paying significantly more for those things.
So Kafka looks cheap in comparison, and then WarpStream looks cheap compared to Kafka. So we're very open about the fact that our product is designed to be more cost-effective.
But we do offer additional, we call them account tiers, basically, where the things that enterprises want from you, the reason why they wanna pay you $10,000 a month is they want to be able to file a support ticket and have somebody reply to their support ticket extremely quickly. That's the thing that they're basically paying you for.
That's the stuff that doesn't scale, basically, as you get bigger or your product gets better. Obviously, you might have fewer support tickets, but you still need humans to be able to respond quickly when somebody does file those support tickets.
So our account tiers for pro and enterprise give customers a support response time SLA that they can count on that today is backed by the engineering team. Like if you file, like if you're an enterprise customer and you file a priority zero support ticket, which is just like my production cluster is down, I need help right away.
That pages the engineering on call rotation and gets you help as quickly as somebody can respond to page reading. That's the type of stuff that people would be paying for basically on top. And that's how we make enterprises trust us. Another reason to raise venture capital, you can hire people so you can have a 24-7 follow the sun on call rotation in order to back those support response time SLS.
So if you needed five gigabit write throughput, which I imagine is quite high, but let's say that you do 14 day retention. So that's two weeks retention. Not that much. We're talking 97 grand per month going to WarpStream and $1.76 million a month using Kafka. These are numbers that blow my little mind.
Sorry, I didn't hear the first year, your throughput number that you.
It was the highest. It was five gigabits.
Five gigabits. Yeah. Yeah. I mean, it's obviously as you get up into these larger and larger. Well, first of all, say 14 days, pretty long retention for most people for Kafka. Usually because it's a transitory, I'd say three to seven days. That's a pretty, that's a pretty typical one.
And if you're at these kinds of scales, you're probably not paying your cloud provider retail price for cross-AZ networking anymore. If Kafka was a big part of your bill, that would be probably one of the items that you would want to negotiate with your cloud provider. So the comparison doesn't get nearly as rosy if you've negotiated some discounts.
But the way that you can kind of estimate what those would be is if you switch it from Kafka 3AZ to Kafka 1AZ, that will reduce the inner zone networking dramatically and turn on the single zone consumer's flag. So the comparison doesn't look quite as good anymore.
Still 10X. Still looking pretty good. Yeah.
There you turn it one day retention. Turn it to one day retention and then the It goes to 86% savings versus 60% savings. So it's still big, but we understand that there are a lot of big Kafka workloads out there. And we're confident that if we can deliver 75, 80% savings, they don't always come out at 90% like that example does.
But if we can deliver 75, 80% savings, it's a compelling enough reason for someone to There's a little bit of activation energy it takes to get people to do anything. And we're confident that that 75% to 80% cheaper thing is enough of that activation energy to get people to at least give us a shot.
I want to point out that these are just dollars, too. This is not developer friction or operational burden or enhanced developer experience, which are the hallmark of any conversation today with dev tools, right? Like, you could be a 13-year-old tool like Kafka and get away with... And I have no idea. So no skin in the game. I've never used Kafka personally.
So if there's some haters out there, those marginal haters I mentioned earlier, don't hate on me. But there may be some warts and blemishes and burdens within the Kafka ecosystem that just makes it just challenging to operate, to stand up. Obviously, there's costs. We've already talked about that literally at length.
But I think there's something to be said about a modern take given today's cloud infrastructure with some of the dev user experience attributes I've seen you already put in place. So cost is one thing, but then happy developers is retained developers, morale boosts, maybe freedom on weekends, less pager duty, less whatever from anybody who might be competing with pager duty. Yeah.
That's a good thing.
Yeah, we we all like at Workstream, we know that that's like us, a very important part of what we of what we do. But it's always easier to walk into a sales conversation with the hard facts numbers and not the. A lot of vendors use those exact attributes to describe, to attribute a lot of savings to their product, which is probably true.
But they feel a little bit more wishy-washy compared to the hard facts numbers. So that's why we lead with those in our pricing calculator. And obviously those are still things that we highlight when we're talking to potential customers to help them understand the value of the product. But we like to think of that as more like the icing on the cake stuff.
And the cost savings is what we're promising them, basically. Everything else is just icing on the cake.
I see on the cake. What's a good next step? I mean, I feel like we've really just gone through all of it, Jared. You got anything else? I think we have. We've covered it all, man. I think we've covered every ounce of Warpstream. Ryan, thank you for being patient with our questions and going through everything and filling in all the blanks, too. I think you did a great job with this conversation.
I'm happy. I'm impressed. I think there's a lot of things I can see as quality in you as a person and also the thing that you're trying to do. I think you guys have led with some wisdom. I like a lot that you went out and talked to folks. rather than just shooting from the hip, so to speak, with your choices and letting it be opinion-based.
You seem to have leaned into the wisdom of those who've come before you with your particular target market, which I think is key to your choices. And so I'm stoked that you were able to answer the questions we asked. So thank you.
Yeah, this has been very fun. I was not expecting to talk about raising money at all during this conversation, but that was something that we spent a lot of time When you're building a company, you have to spend a lot of time thinking about strategic stuff that's not just writing code. And that one was a lot of back and forth with my co-founder and I about how we were going to do things.
And we're very happy with our direction now, but it took the input of a lot of people to arrive at this conclusion. And we're very thankful for those people that made themselves available for us. for learning more about commercial open source stuff because we had never really even considered it before and super important to learn along the way.
Very cool. Well, warpstream.com is where you can go. We'll obviously put links in the show notes. Ryan, thank you. It's been awesome. Thanks, man. Thanks. Okay, so WarpStream seems to be what Kafka would look like if it was redesigned from the ground up to run in modern cloud environments. They didn't opt into open source, and I think Ryan had a pretty solid argument for why not.
But time will tell if an open source copycat comes along to sniff out their lunch and eat it. Until then, good for them for putting in the work to gain the conviction they have for their choices. and their position. Later this week, our game show Pound to Find is back on changelogging friends, and it was epic.
This is the closest I've come to winning, and I was still pretty far off, and that's this Friday. Okay, big thanks to our sponsors this week. Speakeasy, love them. Newdomainspeakeasy.com. Also our friends over at Superbase celebrating launch week number 12. Superbase.com. And our friends over at Paragon, all the B2B SaaS integrations you want in a single platform, useparagon.com.
And also our friends over at Unblocked for all those whys, hows, and WTFs. Check them out, getunblocked.com. And of course, to our partners over at fly.io. That is the home of changelog.com. Check them out, fly.io. And to the Beatmaster in residence, Breakmaster Cylinder, bringing those beats. Okay, that's it. This show's done. We'll see you on Friday.