Bryan and Adam were joined by Oxide colleague, Ben Naecker, to talk about OxQL--the Oxide Query Language we've developed for interacting with our metrics system. Yes, another query language, and, yes, we're DSL maximalists, but listen in before you accuse us of simple NIH!In addition to Bryan Cantrill and Adam Leventhal, our special guest was Oxide colleague, Ben Naecker.Some of the topics we hit on, in the order that we hit them:RFD 463: The Oxide Query LanguageGenAI podcast on the OxQL RFDRFD 125: Telemetry requirements and building blocksInfluxDBClickHouseSimon Willison: SQL Has Problems. We Can Fix Them: Pipe Syntax In SQLOxide CLI timeseries docsOxide CLI timeseries dashboard codeOxQL source codeRust peg crateGorillaClickhouse paperOxF: Whither CockroachDB?ANTLRACM Queue 2009: Purpose Built LanguagesIf we got something wrong or missed something, please file a PR! Our next show will likely be on Monday at 5p Pacific Time on our Discord server; stay tuned to our Mastodon feeds for details, or subscribe to this calendar. We'd love to have you join us, as we always love to hear from new speakers!
I'm sorry I'm being so weird.
With your pseudo podcast hosts?
With the auto-generated podcasts, it's mesmerizing. I can't stop. It's really weird. So Ben, I'm sorry. I was also sending Ben all these, obviously. Uh, you, you have now you've lost your, your gravitas voice. I don't know. I'm taking you a little less seriously right now. You get kind of back to, back to clown college.
It sounds like that's right. Yeah. I need to, uh, hang out with a bunch of germy kids again to, to get it back.
So to get it back. Um, So we, just to forgive the context before the context, the friend of the pod, Simon Wilson, had this blog entry that was on Hacker News describing this new feature from Google, from their Notebook LM, where they can generate podcasts from arbitrary material. And I've been entertaining myself by sending it RFDs and creating single podcast episodes on RFDs.
And they've just been super...
weird it's the only thing i mean that's that's my one word synopsis of them it's a very uncanny valley i don't know ben what do you think are you uh they get some things very right and other things very wrong and then there's those things in the middle which are wrong but you can't quite figure out why it's so weird or they just like slur the words in some strange way it's very it's very uncanny i agree
And they're very cheerful and they are very interested in promoting whatever document it is that you've put in front of them. They definitely believe whatever they've just read.
Lots of soundbites. Real, like, paid promotion kind of vibes. Like, you know, one of those paid promotions where, like, clearly the folks have, like, spent, like, half an hour figuring out what it's about, but not, like, three hours figuring out what it's about.
Right, because, I mean, they can't figure out three hours because they got another one of these to do in another half an hour. But they're just kind of rolling them through. But it was also... Anyway, it was weird. It was... And the RFDs, of course, I was feeding it were all the RFDs that were... going to be talking about today.
So I was feeding it RFD 463, getting the podcast before the podcast, which I dropped into the chat there. So if you want to hear synthetic hosts describe, with also like weird taglines. Did you notice that? I've got like, stay curious until next time. It's like, stay curious. Okay. Stay. Anyway, it was very weird. But Ben, welcome. Thank you.
So I feel we've kind of stepped on an exposed nerve ending for the internet here. I got a lot of people like this was, am I wrong? Yeah. To describe it as upsetting seems a bit too strong, but a lot of people are like, a lot of people have issues with this.
Severe skepticism is the word I would use. Yeah. They seem skeptical of, of the premise basically. Yeah.
Definitely skeptical. So we're talking about RFD 463, which is OXQL, which our synthetic podcast pronounced Oxquill over and over and over again. Ben, we've never pronounced it Oxquill, I think, right? I think it's OXQL, but now I can't unhear Oxquill, unfortunately.
Yeah, please do not do that. It sounds like Dayquil or something is immediately what I think of.
It does sound... It sounds like bovine Dayquil. Our development of a DSL, maybe this is in the way I'm phrasing it. Maybe I was too clickbaity with the way I phrased it. Is that a problem? Or something? What do you mean?
By calling it a DSL?
No, you know, it's like, you know, I'm kind of throwing out a little bait for the podcast. So my tweet is, when is a new query language necessary? And clearly there's a decent portion of the internet for whom the answer to that question is actually never.
Thank you for asking. 18 query languages ago. 18 query languages.
Exactly. It's like, or it's just like, listen, if you're going to invent a new query language, like that's fine, but you need to get rid of one of these others. Like you need... And in particular, there was a quote tweet from Andy Pavlo, a distributed systems researcher at CMU. The quote tweet starts, Brian is brilliant, comma. You're like, oh, no. No, that's not good. No, that's a bad start.
Basically, if someone is insulting me, I know they may be warming up to agree with me. It's the old, actually, I agree with Brian, quote. they're kind of like trying to establish their bona fides before they... They don't want to imply that they would agree with anything I say, but they are... Listen, Brian is often... It's Kelsey's line, right? I love Kelsey's line.
Brian is wrong some of the time, but not this time. I love that. But this is like, Brian is brilliant, comma, is like, this is the end of this. This is going straight into the ditch. But it seems misguided for a hardware company to create a custom query language that no other tool supports. If you don't want SQL, you could have used PromQL or PromQL.
Yeah.
The Justin there is doing a lot of heavy lifting, as usual, when that enters an engineering discussion, a technical discussion. Just is doing a bunch of work.
Yeah, of course. This is an important observation. When Just enters an engineering discussion, yeah, you're right. You're like, okay, cocked eyebrow. Just often does do a lot of heavy lifting.
There was another sick burn, Brian. I just want to make sure you didn't miss this one, which was on Hacker News a week ago, talking about the Oxide query language.
It says, when a small hardware company is not only making its own full hardware and software stack, but brings that all the way down to telemetry query language, I get a lot of NIH vibes and question if any of these elements will get the attention they deserve.
Do people think we're creating our own instruction set architecture? I was like... I just get confused why this is engendering this kind of reaction, especially for like all of the stuff we've done. We've done like so much of our own. I mean, we've gone our own way so many times over. It's like, this is the one that is a bridge too far. It's like, listen, you guys.
Yeah. Andy, have you, have you like, we have our own embedded operating system. You know this, like we're.
We have our own switch. We developed our own switch. Yeah. Like that, although that's fine. No, that makes sense. You know, you're doing your, yeah, you're an embedded operating system, our own host operating system.
Our own VMM user land. Like, do you want us to, I mean, it's going to be a long tweet if we get into all of these. It is.
It is. And it's like, but query language, no. No, now you've gone too far.
Yeah.
That is his area of research, so I feel like if he's going to slot in on anything and talk about why not just use this, I mean, he's got several papers that are about why SQL is king and always will be. So I feel like it's, you know, a demo.
Yeah, and also, like, we're also not, like, advocating for the elimination of any other query language. That's the other thing. I feel like we're not actually... People can use whatever query language. I mean, we're not... We're kind of encouraged. It's just a DSL that we developed for our own use, really. So, Ben, do you want to describe kind of the origin of how we got here? And I thought...
Maybe we would take half a step back, because we've tried to open up a bunch of the surrounding RFDs that get us to 463 or why 463 is relevant. Do you want to talk a little bit about 125 and how we got to ClickHouse? Because I think that's a big part of why not promQL, promQL.
Yes, that's a good point. So sure, I can talk about 125 in the background for that. So this was a while ago now. Most of the work I did, this was basically right when I joined Oxide. And lots of other areas have been de-risked. We had decided on CockroachDB for the control plane database. We had made many other large technical choices.
But we didn't really have a lot of depth on the story behind metrics. And so I came in to take over this RFD125 from Dave Pacheco and other people and really carry it through the end. So the big goal there was really picking the key technologies that we were going to use for the telemetry system.
There were a bunch of things floating around, but mostly we kind of focused on the database in that RFD. And there were, I think, half a dozen or so candidates. Things like InfluxDB, which is sort of an old standard time series database, widely used in corporate environments like banks and other financial industry companies.
Then there are a couple of other systems that have a lot of other wide usage too, like Prometheus. And then there were, I think, two alternatives that really came to the front after the sort of initial read-through what the systems were designed to do, and that was the system called Victoria Metrics and ClickHouse.
And the real reason that we picked these two was ultimately because of the story around replication. We need to replicate the data, and those are the only two that really had any kind of replication story at all. Influx, at the time at least, really was not... did not have anything compelling here. And we could build our own system.
We definitely looked at that using something like some sort of message bus to distribute the data to a bunch of databases. But that, I think, there was a lot to chew, a big bite to chew. And so we ended up kind of going for this one called ClickHouse, ultimately on the story of the replication. And then when it came down to it, I did a bunch of analysis comparing it to Victoria Metrics.
It just sort of handily beat it in terms of resource consumption, performance, flexibility. I mean, it's just really a rock-solid system. And, you know, funnily enough, one of the main reasons we picked it was because it does support just SQL out of the box. Yes, we are not anti-SQL, just to be clear. Right.
Right.
And I think I did a bunch of experimentation basically asking what happens when you do things like kill a node in a cluster, when you do that while you're submitting a bunch of queries, while you're also inserting a bunch of data. And I think, you know, it basically never skipped a beat. And so it was pretty impressive as a piece of technology. And it's only gotten better, I would say.
They've done a really good job of open sourcing things in the last few... I mean, it's always been open source, but they've become a more open organization when they spun ClickHouse off of Yandex, which is where it started. It's now its own organization. They've published several papers about the internals of the system. They really are extremely responsive on GitHub, for example.
We had a number of issues. We asked them to float a number of patches for us, and they did it. I mean, it's been nothing but good things, basically, with ClickHouse. I think I've been extremely impressed with the database as a whole.
And just to be clear, what data are we storing here? Because I think people have kind of a natural question of what ramifications does this have for the user of the rack? The decision around ClickHouse is really an implementation decision.
It is an implementation decision in a lot of ways. The data that we're talking about is mostly numeric. It's mostly scalars, although importantly, not all scalar values. We have histograms, and histograms are sort of first-class citizens in Oximeter. And most of the data is basically... I mean, I can. Well, so just sort of backing up a second.
Really, what you're querying is a bunch of key value pairs, which are the field. So these are just names and then and then typed values for them. And these identify the stream of data points that you're actually interested in. And then a bunch of timestamp comma value pairs. Basically, it's really. The fields are the identifiers, really describing the context for the data.
There are things like the sled that a particular piece of data came from, a compute sled that it came from, or the project of the user-visible resource, like an instance or something like that. And then the timestamps and the values are the raw data that are actually generated by whichever component is producing the data at the very first layer.
So most of them, like I said, are scalar values like integers, floats. We also support things like strings, which one of the reasons for not using something like PromQL is that basically everything is a string or a float. We support a number of other types. UUIDs are extremely common in the control plane because that's basically how we identify anything.
And so being able to support UUIDs, IP addresses as sort of first class typed objects is really valuable. And then, like I said, we also have the support for histograms, which is very experimental in most systems, including Prometheus, by the way. It's basically you can they exist. They have support for, you know, basic forms of histograms. It's not very well tested.
I couldn't find a lot of examples for it, especially at the time I was deciding in RFD 125 on which system to use. It just did not seem up to snuff with the rest of the system as a whole, which is generally quite good. But the support for histograms, which we knew we would need kind of all over the place, is not quite there, I would say.
And what is the data that we're storing in this? I'll say that again. What kind of data are we storing in it?
I mean, what is this going to hold, basically?
Yeah, so we are really storing kind of two or really three pieces, kinds of data. We're storing things that customers will see. So think instance CPU usage is like a good example or instance disk usage, you know, when they're actually right to a disk. We record the number of bytes that they've written. We bump the counter that tracks the total number of writes.
And these are all user visible things. We have similar metrics for user visible instances network data so things like number of packets in and out, for example, always right now we have layer two data so packets and bytes in and out and errors. So that's one thing. The other thing is our own data, which, you know, encompasses things like
At one point, we had an issue with retrying transactions to CockroachDB, and so one of our colleagues, Sean, put together a time series that keeps track of the number of times we retry any query and the duration of that retry. We have things like power and temperature and current and all the key environmentals for the entire rack as well, which are collected from the service processors.
And then we have sort of kind of server level metrics as well. So things like, for example, for Nexus, which is the main control plane service that people interact with through the front door of the API, there are things like histograms for request latencies broken out by, say, the operation that you're performing or the status code of the response.
So, you know, I think there's kind of those three big pieces, service level metadata or, you know, service level data, kind of physical environmental statistics and then user visible stuff as well.
And we're not precluding anyone from slurping this data out and shoving it into some other system that they might want that has some other query language that they... No.
For sure. When we wrote RFT25, I think we all basically agreed that the first thing anyone is going to want to do with the data is just... bullet the raw data you know unprocessed unfiltered to the extent possible just getting the raw data um and when i so i i think i made a allusion to it at the beginning but you know the
that just use SQL or just use some existing system, I think misses the fact that I tried that, right? We tried basically a very, very big, this is the third iteration of query systems that I've built on top of the data that we have. The first one was basically something just to prove that I got back the data I put in. So you could fetch the raw data and that was it.
There's no analysis, there's no nothing. The second version was actually a SQL prototype where you would literally write SQL and I would translate it into a massive SQL query against ClickHouse, which ClickHouse dutifully did, but it would obviously take a long time. And then this is the third.
And I think it was definitely a key part or a key aspect that we needed to support just pulling raw data, which you can still do.
And this is someone in the chat has asked about open telemetry in particular. I know we spent some time looking at that. What was your take on open telemetry?
So I think it's a good idea. I think that it's never quite felt there to me, to be honest. I feel like it's – yeah, so this last comment I think is right. I think it kind of sucks, but it is a standard, which is true. Yeah. it feels a little bit like the lowest common denominator for telemetry data. And I think we would need to spend a lot of time to build a way to translate
our data model, the way it's actually stored into something like OpenTelemetry. And it's never really felt like a lot of value with the obvious caveat of people expect it. And I think that's a very good point. And one big criticism I admit against something like OXQL is that it is a custom DSL. And so something like OpenTelemetry for all of its flaws would allow you to interoperate.
I think I've been okay paying that cost so far because in my experience when there's sort of two things. It's unclear to me I have yet to come across a customer who says, no, it must be open telemetry and there's no way around it. If we show them an HTTP API, they basically go, okay, that sounds fine. There's my raw data.
I can do some sort of processing on that to put it into the system that I have now. I think that's basically expected for almost any type of telemetry system that there will be some amount of translation between an existing data format and the one you actually store it in. And I think that's...
To me, that is suggested that we should build something that works for customers where they can get raw data. I think that's extremely important and also serves our own needs, which we do enumerate in RFT 125 around things like product iteration, diagnosing active problems, all of these things that we've talked about before. And those do not, I would say, rely on raw data.
They almost always rely on things like aggregations. I mean, our experience with something like DTrace has just shown again and again that the ability to actually ask questions of the system is invaluable. And I think we knew we needed something like that. And it was not clear with something like OpenTelemetry that you could get that.
So I think that was a big reason for me to kind of dive into it.
As you point out, the amount of work, it's not clear that it's who we're saving work for. by doing something that's like lowest common denominator that I don't know where, I mean, and especially, I also think that like even DSL even, yes, I mean, this is a DSL, but this is a very little language, OXQL. It's not like... I mean, you're not like learning Haskell or something here.
I mean, this is just like – I mean, I don't know. I feel that – I mean, I get the kind of the – we're obviously careful about that. But I think Benny made an important point that this was not our first conclusion. Our first conclusion was like, hey, we should do a query language. Good thing – because there is nothing else out there, we will invent our own. It was more like –
Let's try to make everything else work. And coming to the conclusion that we're just having to contort ourselves too much. And it's actually very liberating to be able to do our own DSL.
And it's not like, I mean, this was not, I mean, not to downplay the amount of work involved, but this is also like you're using a bunch of tooling that makes it really much easier to develop a DSL than maybe it has been historically. Yeah.
Yeah, so that's definitely true. I mean, so I am not trying to downplay the amount of work involved. So I mean, query language is an enormous undertaking, right? I mean, we've got a parser and a query planner and an optimize. I mean, it's a lot of work to do all of this. And so I do think that starting small, this is actually part of the reason that I
quite liked where we started with OXQL was the piped nature of it does make it fairly straightforward to add incremental features, which I feel like is a notorious problem with something like SQL because the fact that you add a small operator or some other kind of layer on top of your query and suddenly your query now changes from a simple select to, oh, you either write it with this massive subquery or a CTE or some other...
complicated syntactic construct and it feels like it it muddies the interpretation of it just by looking at the query. I think you can look at OXQL queries and basically interpret what they're going to do in English pretty easily, which is very difficult to do with the syntax of something like SQL.
And then also just in terms of implementing it ourselves and adding new features, putting piped operators together in such a way that I can add a new one And today, the way you would do that is by implementing the syntax and then implement basically a function in Rust that turns around and takes in a table or one or more tables and spits out one or more tables.
And we do pay for doing that processing in Rust today, but the whole point of implementing the query language in the way we have is that we can push more things into the database as they become important first-class operations. And I think...
So somebody earlier had mentioned prequel, the pipeline relational query language, which is a language that compiles or transpiles, I guess, to SQL, but is written in a much more fluent syntax. And we definitely looked at that initially.
And I think that was we ultimately decided not to go with that for all the same problems that you're basically building a DSL that very few people have experience with. And you kind of need to choose which subset of the language to support.
But one of the key things that I did like from that and took from it for OXQL is that pipeline nature that you can pass in new data in this relatively self-contained way so that adding features is pretty cheap for us.
It just feels like also with the, and again, I don't know how much of like, this is just like the Unix having seeped into my own DNA or is Unix kind of an outgrowing of the DNA that exists in all of us, right? I've got no way of actually differentiating those two, but it does feel like the pipe syntax to me feels pretty clear about intent.
And it feels like we can also then do a lot of things on the back end to optimize that. Because you're being so clear about your intent and you're not having to do unholy things, we can actually make sure that we can optimize those use cases. You can use an entirely different ClickHouse feature. Just some of the things that we were brainstorming about.
You could do a whole bunch of different things.
Yes. I mean, we don't even need to hit click house or we can hit different tables. We can decide. I mean, and this is true to be clear. I think if you have a language like, like, well, any sort of front end language that you compile down into something you run against the database. But the nice thing for us is that it's much easier to understand and look at the query.
to decide which database table or tables to look at, if any. We can decide to implement things by going to look at some materialized view rather than the original tables. And I think that would be much easier to do when you have a relatively small, simple kind of operator-based language where you pipe things in and out of each other.
I think it becomes much more practical to do that kind of thing than it does if you're carrying many years of features in SQL. Or you have to pick, which this is the other big thing, I think, is that I... We can implement SQL as sort of a front end. That's the language people would query.
I think it's pretty clear that you basically have to throw away 98% of the language if you do that to turn it into useful data analysis tools against the data that we have. And it felt very weird to me to start from something where we care about almost none of it. We obviously don't care about anything other than select because nobody can write data using this path. So updates are totally out.
uh their inserts are totally out right um deletes transactions are out transactions are out uh for simplicity to start i basically when i wrote this sql prototype i basically uh you know the only thing you could do is a straight select statement and that was it you could do joins but no sub queries um you know
Things like window functions, which I think are extremely useful for understanding time series data, become impractical to implement using this method. So I just think it sort of became a pretty stark question of how much of the baggage of SQL did we want to carry around if I was not going to use any of it anyway.
What is the point of you're not actually getting SQL compatibility when you're not doing all these things? All these things have no relevance in this specific domain. There's a reason we have domain-specific languages. I just cannot emphasize this term enough because I think it is a great strength where we can create little languages easily.
I think we should not be resistant to that because I think the kind of compatibility that you have by doing, let's say, SQL, as you're mentioning, it's like a false compatibility. It gives you the wrong intuition for the system. And it's like, sorry, this is not what the system is actually going to do underneath it. Yeah. you know what we're trying to do.
And I, the other, another question that kind of came up in the chat is like, wait a minute, it's okay. So you said that like all anyone's going to do with this is like slurp it out. So like, why wouldn't you just use some other protocol that people already know? It's like, well, that's all because that's all a customer might want to do with it. We want to do a lot more than just slurp it out.
We want to actually go and use it dynamically and, and be able to actually look at, look at the rack and ask questions of it. And so for us, we want something that is much more tightly tailored to that.
Yes, I think that's a great point. I think there are two or three important features that Just pulling raw data does not support. One of them is debugging active problems, figuring out why is the system behaving the way it is. And one really useful way to answer that question is to figure out where it's come from, what state it was in before you walked up to it, the recent history.
The other big thing that we haven't really talked about is the idea of alerting and making those alerts configurable
in the same language that you would use to query them is a strength that you know we put we got from something like prometheus right which which you know does do that right i think it's a very very useful way to basically just say hey here's this condition on which i would like to generate alerts and here's what you do once that happens here's the threshold here's the you know the promql in that case in our case oxql expression that one would trigger the alert on
I think these are really valuable. And then the other really important thing that we didn't talk too much about is a much longer iteration cycle. In RFT 125, there's a section on it which is basically product iteration. We can look at it and understand things like, over this year of historical data that we have, how often did some component fail?
Or how often did the power fluctuate outside of our tolerances for that system? And I think being able to do that, you know, really means you need a language to be able to understand that because you can't possibly sift through, let alone graph or, you know, display millions of points. You need to be able to kind of ask questions like, how many times did they exceed this threshold?
What was the, you know, 99 percentile behavior? And, you know, you just can't do that if you're going to.
Only thing you can support is pulling out raw data. Right. We want to actually be able to query those things in RAC effectively. Right.
It's worth mentioning that we do have these grand ambitions, I think, Brian, that you alluded to about what is possible. And we've already mentioned this earlier, but we don't want to be constrained by the lowest common denominator. We didn't want the query language that customers would use to explore the data to inform the kind of data we bothered to collect.
And then histograms, Ben, as you were saying, I feel like it's an area where we are total zealots. Maybe everyone's a total zealot. I just don't know it. But we feel like real zealots in that regard. Is everyone a zealot?
I mean, if you base it off of the support in the telemetry systems that I was describing at the beginning, it doesn't seem like it. I mean, I don't think... Influx, at least when I looked at it, it was not clear they had the concept of an array or of a histogram in general.
I'm reminded a little bit about how Cliff accuses us of being really into postmortem debugging. I feel like we are similarly really into histograms, I think just because we've seen their utility in so many domains.
Okay, with post-mortem debugging, it's taken me quite literally decades, but I'm willing to acknowledge, okay, I am somehow an outlier with respect to society. This is some sort of software kink that I have with respect to being willing to debug a system from a static state. It's static in memory state. So, okay, fine, but weird. But histograms, really? Are we histogram radicals?
I just didn't realize. Are we...
I mean, I feel like both of them, everyone should feel this way, and maybe people already do on histograms, but I just feel like, I mean, as Ben's saying with these other query systems, they're not necessarily embracing them as first-class primitive, whereas it was very, I mean, I think, I don't know if this is fair to say, but Brian, a bunch of my thinking on this was informed by the Fishworks system, the ZFS storage appliance and the analytic system that you built there.
Yeah, for sure. Most of it, much of it grounded on histograms, on these distributions to visualize what's going on in the system. I mean, that really strongly informed it here. And we knew we wanted to build a system where that was front and center.
Yeah, for sure. And I think if we want to actually, now that I know that this is apparently a strange idea around histograms, you chase this through to Bonwick and Lockstat using histograms for looking at lock times, spend times, block times, and looking at that actual distribution of data.
And so, I mean, honestly, it was when part of the reason we have aggregation as a first-class operation in D-Trace was because of our eye on Lockstat and replacing it. It's like, okay, this is important, this idea of getting a, what the distribution of data looks like.
And so, you know, I can't really thought about that, that, that, you know, maybe that was, that felt very commonsensical, but maybe I'm not giving Bono much enough credit. Maybe that was very iconoclastic to be thinking in terms of the distribution of data. I mean, he was a stats concentrator. It was a stats grad student.
I mean, I feel like it's only been 10, 15 years since people all recognized that the average was not a number you really wanted to talk about in polite company. But that's fairly recent.
I do feel they're right. I think that you're right. That is recent. And I guess we've always just kind of run with sets of people for whom understanding the distribution of data has always been really, really important. And it just feels very natural that that would be a first-class operation.
I think it's probably the usefulness of the average or the lack of usefulness of the average I think is probably pretty obvious if you've actually been in and debugged systems where there is large variance or these distributions are just not...
you know not normal or not even you know single modes or anything like that right or have these really really heavy tails where the where other measures of central tendency are useful or even none of them and you actually just care about things like the max or the min or some other sort of extreme value i think the reason i mean the focus on the the mean is because it's useful computationally right you can easily compute it it's very easy to understand you know it's it's
linear in that if I add more data, I can just keep track of the running mean. I don't have to keep the whole history of the data where something like the standard deviation or the median is not really possible to do that. We teach third graders how to compute it. So how bad can it be? Yeah.
And it's very, you know, I mean, in sort of statistics, you know, research, and I mean, obviously less so research, but kind of the statistics that most people have been exposed to, the idea of computing the mean, you know, is really, really natural. And you kind of think about it below, it's just very easy to do. And so I'm going to do it.
Obviously, you sort of forget all the assumptions that, well, maybe your data isn't normal. And it just becomes, I think, as a practitioner, you become more ingrained in understanding why it fails to really give you a useful answer.
Yeah, when also if you look at the average, you don't actually understand that much about your data. And you think you do. It kind of gives you the sense of like, oh, this is what my data looks like. And it's like, well, you know, maybe.
But you may want to get just a little more fidelity in what that actual, what the distribution looks like before you conclude that that's what your data looks like.
And so somebody asked about these other, you know, the second and third and fourth moments, the variance skew and kurtosis, for example. And these are really useful, but they actually are pretty computationally intensive to compute. And histograms are very easy. And you can see those well enough, I would say. Those sort of actually give you
I don't know if it's more information, but they give you different information, right? So there's this idea, something called Kolbach-Leibler divergence, which is basically a statistical measure that tells you the difference between two distributions.
And it's very easy to see when you do it, when you plot these things visually, you just sort of see the, you know, the amount of overlap in your histograms, right? If you plot them with basically bars that are transparent, so you can see them, you know, or a grouped bar chart or something, it's very easy to see, but the numbers are pretty tricky to either to compute or to sort of
give you a useful measure of that divergence a priori, right? So what I mean by that is I may know that the kurtosis is useful, but only after looking at the data. And so I need to keep track of this whole distribution and a very cheap, compact...
constant in memory time you know constant memory and constant computation way to do that is a histogram so it's extremely extremely useful when you have potentially unbounded sets of data and you really can't pay that cost and you want to limit the resource consumption and and maximize the kind of understandability of of your distribution so i think it's extremely useful for for those types of examples
And so obviously, ClickHouse was a natural fit in part because of the way they thought about the problem.
Yeah, so they have first-class support for arrays. They've obviously built a bunch of tools around histograms themselves. So computing a histogram of a column of data is something that you can just do. So they have a bit of a confusing way to do it in that, like most things with ClickHouse, they have aggressively prioritized performance.
And so what that means is for almost every operation, there's a... exact version and an inexact version. The default, unless you ask for it, is the inexact version. For something like a histogram, that's also true, where it basically tries to compute the bins for you, but it's going to do its best.
It won't be off by too much, but for most things, it'll work, but if you really want the answer, you have to compute the exact values.
I didn't realize that. Do they give you bounds on their hand-waving?
they basically give you bounds on how bad the estimates for the bins are going to be in that case. For things like the, so for example, they have a median or a percentile in exact, quantile in exact. It does not hallucinate data. It's not making up points, but it's basically, you might get all of your data grouped into one bin or another.
I just had the most embarrassing realization. I'm like, I don't know why you'd need that. You probably only need it for things like analytics from the web, like clicks. Oh God, I'm horrible.
There we are.
A house, a house of clicks.
That's right.
Yeah, but they do have, I would say, a lot of array-based tools, tons of functions. They've got this sweet idea of basic... So... In normal SQL databases, you obviously have aggregations like the average, right? Those are used all over the place. ClickHouse has this first class support for arrays, and they said, well, how do we support that sort of thing?
Well, they're just like, we're going to make this idea of aggregation combinators. So you can tack on things like average array. So the word array comes at the end of it, or min array, and it'll apply the thing Apply the aggregation that you've asked for to the array as if it were a bunch of items. So it's extremely, extremely flexible with what you can ask it to do, how you can process it.
You can do things like map over arrays. You have all of these higher-order functions for doing filtering on arrays. I mean, it's extremely valuable. And having all of that is just so, so, so, so useful for building a system like this on top of it.
And we have not scratched the surface of that kind of stuff that we can go do.
And I think that part of the... Yeah, no, we're basically just doing select and some averaging and some grouping.
And by critical wrong, but part of the appeal of a DSL here is the ability to add some functionality that would actually help us express some of that that we can get out of ClickHouse. Yeah, that's right. Through a consumer of that.
Yeah, that's right. So as an example, we have this idea of an alignment table operation where you take time points that are close but not exactly evenly spaced, roughly every second, but there's a few milliseconds of data on either side for every sample.
So we have the notion of an alignment operation where you can say, OK, I actually want to register them, sort of snap them to a temporal grid to be exactly one second apart. And the way you do that is by specifying how to group things that are within one second, for example, within that alignment period.
So today we do that by averaging, which, you know, for all of its problems, averaging does have a lot of uses. But we could, for example, do that by something like instead of taking the average within an interval, you could take the min within an interval or really any other linear operation. You can imagine doing it and
In theory, when we build that inside the database, that should basically be switching the aggregation function that ClickHouse uses from average to min. And it will be very easy to express these much more complicated operations with really a few small changes on top of the framework of this kind of piped query language.
So a question that I think may be very catalyzing for people to answer to is, is it a fair assumption that the main client of raw OXQL is Oxide-provided tooling, dashboards, alerting, et cetera? That's what that question is.
Yes.
Yeah. Today, it's me is the short answer. I guess a couple of other developers. I mean, yes, I think the biggest initial consumer will be two things. I think there will be customers collecting it. So we didn't talk about this, but we only store data for 30 days today. And we... They recognize that customers will want to store it longer than that, potentially in some rolled up form.
But we want to give them the ability to do that. So I expect that people, customers pulling it into their own longer term storage systems will be one of the big things. And then the other will be visualizations in the console, in the web console. Today, we have a few visualizations around things like disk metrics.
Those are built upon that first querying system that I meant where you basically select the raw data. And so this has a number of kind of weird problems that, not problems, it has drawbacks, right? So as an example, that data is cumulative. We keep track of a start time and then the counter only goes up for every write. We bump it by one and it never goes down, right?
So when you're selecting the raw data, you get that cumulative data. And so in the console, in the web console today, if you open it up, it just shows a graph that is monotonically non-decreasing, right? But most people don't really care about that. They want the derivative of that. They want to know how many writes did I incur in this period of time?
I want to see sort of how the thing is behaving. What are the dynamics? And you can get that mentally by looking at the slopes, but it's hard, right? You don't want to do that in general. And so being able to do that is basically the reason that we implemented
Something like, you know, OXQL's automatic adjacent differences, these deltas, when you select a cumulative time series, it automatically computes that delta for you on the assumption that that's what you want most of the time anyway. Obviously, we can build a system that doesn't do that or, you know, a table operation that does not do that.
But that is definitely the most common thing is to be able to look at those differences over time.
And you said the console, also I'd like to point out the CLI also has the ability to... Let's not sleep on the CLI's ability to visualize data.
Yeah, there is a under the Oxide experimental subcommand, which is where all of the time series stuff lives, because I do still consider it pretty experimental at this point. The... There is a dashboard subcommand where you can run a scalar query and it will plot all of the time series that come back for you in your terminal, which is kind of fun.
And you can run the query itself and more queries, I should say, directly just to get the raw data back as an HTTP JSON object, but the the CLI will plot for you scalar time series, which I think is really useful. We haven't done the histogram stuff, and I think that's going to be very fun to see heat maps around things like I-O latencies for virtual disks, I think is a good example.
I think it will be very cool to be able to see those in the web console and or in the CLI. A heat map in the CLI will be pretty fun.
Those graphs are really fun. What is the package you're using to draw that stuff?
Yeah, it's a library called Ratatouille, which is a Rust sort of, really it's a terminal manipulation engine, right? You have this idea of a screen and then you can do things like draw widgets to it in a bunch of different ways. And these could be, you know, your normal Toohey things like columns and tables and stuff like that.
Crease, for example, if you wanted to implement something like the Unix tree command that shows a tree of files, you could do something like that with Ratatouille. But it also has the notion of basically little glyphs that you can use to draw things. And it's got first class charting support where you can have x and y axes and alternate y axes. I mean, it's very, very useful.
Yeah, so I think it's a really cool library. And we've used it in a number of other places. Wicket, which is the rack setup captive shell that you run when you first install and set up a rack, is all written. It's a graphical interface written in Ratatouille as well. It's very, very powerful.
And humility dashboard. I'm a little hurt that humility dashboard is not coming up here. Honestly, I was waiting for someone else to mention it, but apparently no one was going to, you know, it's not going to.
Plug your own technology, exactly.
Listen, I got to plug my own. Around here, you got to... You know what? I'm going to have the AI, my AI overlords, generate a podcast full of praise for Humility Dashboard. I think I need a little pick-me-up from my bots. But Humility Dashboard also is... This allows us to talk to a service processor and graph all of our environmentals, and it's been very... It's been great. I love Ratatouille.
So much fun.
And those, those environmentals are really valuable. I mean, they were super useful at the beginning when we were bringing up the first boards because of your ability to, you know, through the service processor directly look at those environmentals without waiting for the host, right? So as you're trying to get the host to boot, those are pretty important.
Yes. Yeah, it is. It has been. It's been great. And it's been great. I also love writing software that the double E's use, because just, you know, they have lived such a tortured existence with respect to software. It's very nice when they can be delighted by software. I feel that the Ws live a hard life.
And so when software can delight them, when they're not using some vendor-specific Windows goober that they need to program apart or something, you can actually give them something delightful. It's really great. They don't get nice things. It's really true. They don't get nice things. That's what I'm trying to say.
They don't get nice things, and as a result, their standards are very low, and you can do very little work and give them something nice, and they are just
filled with praise it's great yeah um so yeah ratatouille's been that's been that's been fun in the end uh folks are asking if the code for that is available that is all open source all that stuff open source i think i mean it's all open source but all of oxql and the dashboard are well the dashboard is in um the oxal oxide cli that adam linked a minute ago um
And the yes, so that someone just dropped the link to that. Yes, the command dashboard or command time series dashboard is the Ratatouille code that draws everything. And then OXQL itself is in Omicron. There's a library called Oximeter DB, which is basically the ClickHouse interface and all of the OXQL implementation is there. Yeah, somebody else just just linked that. Yeah. Thanks, Sean.
Can you speak a little bit to the implementation of the DSL, by the way? Assuming we've gotten people over the hump of like, that we've got the right to implement a DSL here. I'm not sure that we've got everyone on board with that. But you know what? Just bear with us. And can you get into kind of the mechanics of building that?
Totally. So I've never written so much recursion in my life, my professional life. The first step is a parser, which takes a string and turns it into an AST. And that is written with the help of a library called PEG.
which is based on the idea of these parsing expression grammars, which are a formalism for writing basically strings that you want to match against and turn into a specific kind of AST abstract syntax tree node. So we write mostly regular expressions to match pieces of the query, although
Part of the reason that I used PEG initially was it also supports doing things like running rust functions, normal rust code against the string to do things like parse out a float, for example, which is very useful because the regular expressions for floats or let alone something like IPv6 addresses is terrible. So rather than write that in a regex, you can match against it in some other ways.
I definitely considered NOM. NOM is fantastic. I really like it. I think... I think this will need some TLC in the long run, but for right now it serves our purposes quite well to parse everything from PEG, using PEG. So basically we parse the string with PEG. There are some limits. Somebody asked about limits to things.
They're pretty crude at this point, basically the overall length of the query, which is not really related to the number of table operations, but in practice it seems to be pretty good.
the uh we parse this string into a regular expression and then there's really kind of a couple of um planning steps that we do once we have that um i think i didn't really talk about this and i i we kind of brushed past it but in in the rfd um 161 which is one of the background rfds for for the oxql rfd itself i talk a lot about the data model and i think it's kind of useful uh to talk a little bit about that now so um
One of the main reasons we don't just use SQL or some other existing out-of-the-box language is we don't have a table that corresponds to the time series data. What I mean by that is there are tables in ClickHouse that store the data, but we normalize it when we select it or when we first insert it.
So the way we do that is there's a program called the oximeter collector, which is aggregating all the data. It's pulling data from all of the places where it's generated called the producers. And it takes each sample and picks apart the fields, those typed key value pairs, and the measurement itself that's in the sample. And those go in different places.
All of the fields go in their own table, broken out by type. And then all of these, so we have a field table for UUIDs, a field table for IP addresses, etc., And then all of the measurements go in their own table as well. And we need to be able to re-associate these all together.
So we create this, basically, a foreign key relationship between all of those by generating a hash from the time series contents. And that's from the fields, really, or sort of the identity of this. It's just a U64 that lets us associate everything once we have inserted it in this normalized form. So we put all the fields in one table, all the measurements in some other tables.
And here is, again, where we lean on the strength of ClickHouse. One of its key features is this idea of different table engines. So its workhorse is something called the merge tree table engine, which is basically this idea that you can insert data in these large chunks called blocks, and that is extremely fast.
And that's because ClickHouse basically does nothing except mem copy the data directly to disk. They don't do anything with it. They don't look at, I mean, they run checks, they run other things like that, but they don't really do much. In particular, there's no such thing as a unique primary key in ClickHouse.
So what that means is they do not have to check, for example, that your row is unique, that it violates a primary key constraints. They do not care. They say, that's your problem. And part of this is great because you just insert data and then in the background, it merges that with all the existing data to construct a new compacted, compressed, sorted array of everything, right?
So ClickHouse's model, the fundamental model for like a traditional relational database is basically a B-tree, right? That you have this tree relationship of primary keys. And once you do that, you get the value for that B-tree, the sort of logical B-tree is the tuple, right? The row. ClickHouse is not that.
ClickHouse is a sorted array where you can have any number of duplicates in it that you want. But there is no such thing as a unique row. And this is like a big paradigm shift, a very different way of thinking about things. So for us, we need to have a way of re-associating everything on disk and after it's put on disk. And the way we do that is with this time series key.
It's just an identifier that we can match everything back up. So once we've inserted everything, ClickHouse, oh, I forgot to, I sort of buried the lead there. ClickHouse has another table engine, which is called the deduplicating table engine. And the idea is that on merge, when you're doing that merge between different parts of data, it can sort them and then
basically like the unique, you know, the sort to unique command pipe in Unix syntax to remove neighboring duplicates, it's doing exactly the same thing. So every time it does this merge, sorts the data, and then removes neighboring duplicates. So we rely on this for the fields. So we do not have one field for every sample, we have one field for one field sort of tuple for every time series.
So you may have a million points, you will only have one set of fields for that time series. And I think this is a lot less data. Part of the reason, for example, that that SQL prototype that I built falls over is because you have to denormalize that data to make this giant table where you've duplicated that row for every single measurement, every timestamp.
And ClickHouse is extremely good at compressing the data, but it doesn't matter that much when you're talking about millions of strings or something terrible like UUIDs, which are by definition random, right? I mean, those don't compress that well. So, you know, you have to pay that cost and it is a bridge too far. Considering, I think this is also important, it's not our storage.
It's the customer storage. And so we can't sort of just use as much of it as we want, right? We do need to be parsimonious.
That's a great point, Ben. That's a great point because keep in mind the context here, which is customers bought this thing to host their data, like their virtual disks, their virtual instances. So we're like, actually, we thought maybe some of ours would.
Let's face it, Adam, you and I would, left to our own devices, we would make this thing be an absolute navel-gazer. We'd be like, we are dedicating all of the resources of this rack to gazing at itself. Like Narcissus in the pool, it is enraptured by its own reflection, using all of its storage to store thermal data about itself.
I don't know, we'd let users get half of what they paid for. I don't know, something like that seems fair. And then we'd take the rest.
Yeah, I mean, look, let's just agree that we're glad that Ben is here to point out that they paid for it, so they should be able to use it for the things that they care about. Yeah, and I mean, this is where ClickHouse is just outrageously efficient with the way it stores things. It's mind-bending.
Yeah, it is, again, another kind of amazing set of features that they have is around the compression algorithms, the compression codecs. You can do things like nest compression into each other. So as an example, the default, by the way, is really pretty good. We still use the defaults. And they're very, very good just out of the box. So all it's doing is ZSTED, which is just a normal kind of...
Gzip like compression algorithm on generic data, right? There's doesn't take any into account any features of the data itself, basically chunks it into blocks and then does Z state compression on that. And that's it. But for us, one of the open issues we have is around investigating better compression codecs. So they have things
at the idea of deltas so you can take the difference as somebody just mentioned this like with uuid v7 you could for example store the diffs of two uuids because they're time ordered and you can then store half as many bytes for example we don't do that for uuids because we use v4s but you could do that for things like actual timestamps which generally are not very far apart, right?
And they even implement things like delta, double deltas. So you can do a delta of deltas. And if you're talking about a regularly spaced timestamp, it's extremely good. Those are almost always going to be very, very close to zero. And so it compresses very well. They have things like something called gorilla compression, which is something that came out of a research paper from Facebook.
They've got all of these different methods for basically, you know, very, very, very tightly compressing the data. But just with the
Is this gorilla as in like Diane Fossey or gorilla as in like insurgency?
I don't know where the name came. I don't know where the name actually comes from. I'm not, I can't remember why they, why they came with that.
We're talking about the Diane Fossey variant of gorilla, not actually the warfare variant of gorilla.
I mean, it is a, I think it's based on the animal, if that's what you're talking about, but I, it doesn't have a U in it.
I mean, it's kind of, it's kind of an O, not a U. It's not a, not gorilla.
It is not a warfare gorilla. It is gorilla. Yes, that's correct.
You do not pronounce those two things separately. I do not. Do you? Okay, thank God.
For the AI's benefit, so that it can... For the AI's benefit. But like I said, the out-of-the-box that we do use is very, very good. So as an example, the last time I checked, like a week or two ago... We have around 15 to 20 billion rows of data, unique points of data in our database, and it's about 100 gigabytes of data on disk. So that's about eight bytes per row, if my math is right.
which when you think about the fact that we're storing all of these fields, all of these UUIDs, a bunch of strings, we're storing histograms, we're storing all of these things. And these things are not one U64 wide. They are many U64s equivalent wide. It is a very big database. I mean, it's not small, but I'm just pointing out that with the compression, it's quite good.
even just without doing any work, just what it gives you out of the box is very, very good. But I do think there's a lot of room for improvement there. We should be able to get things much, much smaller or store more data for the same cost. Basically, we could make that configurable for the customer.
If they are willing to give us 100 gigabytes of their disk or 200 or 500, then we can store more data for you.
But that's something that they should be in control of, I would think, ultimately.
Yeah, absolutely. Another question that came by was about getting how we are thinking about notifications when data does become abnormal or some definition of abnormal.
Yeah, we have done a bunch of writing on this in RFDs around alerts. 125 talked a little bit about it. 116 talks more about it. We don't know is the short answer. They're not implemented today.
We think that the most expedient first path would be doing something like you write an OXQL query that you care about, taking a page out of Prometheus's book, and then you tell us how to send a webhook when that triggers. I think that would be basically the first stop. People...
Again, for all their problems, webhooks are, I think, a lowest common denominator that generally are pretty good and that you can basically post whatever you want in that body. But we don't have to worry about things like finding email servers and, you know, worry about which protocol you're going to use for that. So I think it's not storing their credentials for those email servers, for example.
I think it's quite useful. But I do think that would be the basic first step. First example would be, give us a OXQL query, tell us which piece of it you want to alert on, say something above a particular value or something non-zero or any number of points in this query. And then we'll post a web hook wherever you tell us at the end of it.
And for the things, I mean, are we kind of thinking about, because in terms of like the amount of OXQL that's user-facing, I would assume for some of these things, we'll create new endpoints that will basically be distilled into OXQL queries, but will actually be an endpoint that will be its own endpoint.
Are you, what do you think about that? I think, yes, most likely. I do think that there's a, you know, this was one of the main, One of the main design trade-offs that went into picking something like OXQL... I'll get back to the CPU usage that somebody asked about in a second. So the original system basically relied on a...
on an untenable scaling of per resource query endpoints or per metric type query endpoints, which as you add new metrics, that means you have to wait for them to become available through the API for them in an API endpoint. And how do you do things like versioning when you have something like that? I think it becomes pretty tricky.
And so we decided for now to go the other end of the spectrum, which is you have one endpoint and you write a query. And I think it obviously has its own issues.
But I think... I'm very grateful for this approach. Speaking strictly selfishly, because when Eliza hooked all of those lower-level environmental metrics up, it just meant that it automatically... popped out at the top.
You get them for free. You don't have to wait for an update. I mean, you have to wait for an update for the data to become available in ClickHouse, but there's nothing else, right? Nobody really outside of the producer. Actually, that's it.
The only thing you need to update is the producer itself because Oximeter will collect from it and all the data is sort of organized the same way with these field tables. This was another big reason we normalized the data the way we did. The alternative is doing something like creating a table every time you see a new time series with a new schema.
And there's a lot of problems with doing something like that. And it's unclear exactly how you do that, especially when you get to something like a replicated setup. So we opted to do a different thing, which is normalize the data. So we have a static database organization, a static number of tables. And you can add new rows into them, new columns into them as you collect data.
But yes, something like Eliza's environmental metrics just become available. But I do think that as we find particular queries that get run all the time or that are expensive to run or some other reason, we find them very, very useful. Or the customer just wants us to cache a query and run it the same way they do for their alerting, but they want to just be able to fetch it
know hit an endpoint that says get this query and they specify the name of the query and we go run it for them and then and then return it um to them i think within a special endpoint that we maintain for them i think some things like that are certainly going to be very valuable and you can build all of those on top of oxql going the other way is extremely hard
So this is actually a really important point, I think, is that OXQL kind of gives us the foundation to figure out what you would want to distill into a perhaps even more limited abstraction. And it would be a lot harder to do that with using these other kind of query languages that felt like a much poorer fit for the underlying data model.
Yes, I think that's right. We can basically build what we want. And on the product iteration front, we can keep track of the queries that are run, look at them with ClickHouse's query logging, for example, or our own logs, and figure out what queries are run and which ones are valuable, which ones are expensive, and how can we make those better as we iterate on top of it.
Somebody had asked about migrations, and then somebody earlier had asked about CPU usage. So taking these two in order, the CPU usage can... Limit using the same kinds of resource controls that we use to limit any other utilization for another service in the control plane, which is basically giving the zone only so much CPU or so much memory. ClickHouse is hungry. I mean, it will eat up.
When you read their documentation, the first thing they talk about is basically like, I wouldn't run this on anything with less than, I don't know, something. I can't remember what it is. It's like 128 gigs of memory, which is like big, right? That's a lot. And it basically just immediately takes over and prepares itself to use everything. It doesn't use everything right away.
It's pretty efficient. But once it starts running a query, it will dispatch it to as many threads as it can. And that will use a lot of CPU usage. But we would limit that by basically putting resource controls on the zone itself. We have not done that today because basically we don't know what to put on it.
But this is part of the product iteration is that as we run those queries, we can figure out what is a valuable limit. ClickHouse also, I think, has a lot of controls, for example, about what it decides to do when it can't use all the CPUs it wants to, whether it fails the query or it starts running it slower or returning fewer rows.
You've got lots of controls over things like that when it decides to spill to a temporary file versus keep things in memory. It does eat all of the RAM that you give it, though. It's extremely, extremely hungry. And then for the question about migrations, it's very easy. So ClickHouse, for the most part, is like a SQL database that you're familiar with. You've got alter table statements.
You can add columns if they don't exist already. I think you can definitely, we do support updates to this table schema ourselves, the database schemas ourselves, as we decide we need them. But I think it's important to note that we have far fewer updates to this table setup than we do something like Cockroach that stores the click out, sorry, that stores the customer data, the control plane data.
And the reason for that is, again, that we're not creating a table per time series. we are creating a relatively static number of tables and kind of using that to store all of the time series logically, but they're all mixed in there, right? They're sorted in various ways, but they're all mixed together in there.
And so we end up, I think we have something like 11 or so versions of our database schema today, whereas we're on like 100 or so of our Cockroach database schema. Part of that is, you know, we do a lot more work on the cockroach schema themselves as we add new features. But I think we don't need to do a lot of updates. But you can.
It's very straightforward. Yeah, I mean, and then what about RAM utilization?
Yes, it will. As much as you have. When I was saying earlier that ClickHouse likes to eat, that's what... It does.
I mean, it will eat whatever you give it. So, I mean, part of the value of the... So ClickHouse is really built around these kind of a few different interplaying ideas, which I think are kind of cool when you get in there and dig into the technical details. Extremely good compression, extremely good vectorization on the destruction level, and the idea of this merge tree engine that allows you
by paying for it with no primary keys, no unique primary keys, allows you to operate on the database as if it's a sorted array. And these kind of three things, along with a bunch of other, you know, incredible technical details, means that they can chew through the data in a very distributed way.
So when you run, when you, like today we were just doing this, I was looking at the threads that ClickHouse is running, and it's basically everything's just sitting in a thread pool. But as you run a query, you see them switch from just sitting idle in the thread pool to running something under the HTTP handler, which is one of the interfaces that we use for talking to ClickHouse.
And it can parallelize the data because it's broken out into this giant sorted array. And so it can use its indexes, for example, to tell you that, OK, I only need to look at these eight blocks of data. And it stores them in these blocks. And it basically parallelizes on that level. And it can run these massive queries by just chewing through at basically the speeds of memory bandwidth.
It can chew through the query by parallelizing it over all available cores. But it does mean you are... Their main goal, I think, as an engineering organization is to keep the cache full, I would say. That's basically what their jobs are, is to keep the cache as full as possible so that they just never have to wait.
never stall for what we made. And then like every operations, a table scan, what have we made table scans really, really, really, really fast.
It's extremely fast. And they do have, they do have the idea of second, it's not secondary indexes. I was just reading about this today. Uh, again, they, they have this notion of data skipping indexes, which is different from secondary indexes. But again, it kind of comes back to the idea that there is no unique primary key.
So they, they work quite differently and they can be pretty counterintuitive. Um, Has ClickHouse ever restrained Crucible in terms of throughput? That's a good question. I am not 100% sure. I would expect that the limitations are elsewhere. They're not in ClickHouse. I think it would be my guess that we're waiting on the network.
Well, that may also be reflective of a little confusion. ClickHouse is not hosted in Crucible. So we're doing replication through ClickHouse's own mechanisms, whereas Crucible is what we use to store the data associated with instance data, like the customer's virtual volumes. That's what Crucible is for.
And so we're using those same U.2 devices, both for Crucible volumes and for ClickHouse, but they're pretty much separate concerns.
Yeah, using the same physical device, but you will, but other than that.
And then in terms of, I mean, Ben, you've elaborated a bunch of directions that we want to go take this thing. I think that we want to, I mean, at the moment, I'm just like... it's great to actually have a bunch of data in this that we can go mess around with. Yes. And actually go learn the kinds of things that we want to go do. We know that ClickHouse gives us the right foundation.
We think OXQL gives us the right foundation. And then what are the things that we want to go add either additions to OXQL we want to make, and then especially applications we want to build on top of this to allow one to make better sense of kind of rack-level and ultimately multi-rack-level data.
Yeah, I'm sorry for backing up.
Somebody may have just dropped this in actually a bit ago, but this is the paper that I was alluding to in the chat. I dropped the link. It's pretty recent, and they basically talk about all of these pieces that I was mentioning at the beginning. It's basically a what makes ClickHouse so fast paper, but they've done a very good job, I think, of...
of describing the different pieces, why they've picked the trade-offs they have, that performance is king. They have a lot of load stars that they use whenever they have this trade-off question. They usually come down on the side of performance. And it served them very well, and it serves us very well for this particular use case. I mean, obviously, it would not be a good idea to store
customer data where we care about consistency in a database like this. You can't get unique primary keys, and that's really important for a lot of things that we do, just not our telemetry data.
I mean, in many ways, our decisions around ClickHouse and Cockroach have almost opposite constraints. Correct. And we have really made two very different decisions there for very different reasons. We would not want to... Certainly, it's hard to see one database ruling them all. These are two extremely different ways of thinking about data, looking at data, reasoning about data.
Yeah, I think that's right. And, you know, we just had Dave Pacheco on a few weeks ago, right, about Cockroach and talking a lot about the underlying implementation and the design choices that they've made. And yeah, I mean, I completely agree. There is no real way, I would say, to use something like that for this particular model.
And I think picking two databases that have all of the strengths we need and their own weaknesses, but all of the strengths that we need, I think, is very useful. It's definitely worth the complexity, I would say, of managing two databases. I cannot imagine actually storing the data that we have in ClickHouse. I can't imagine storing that in Cockroach.
No, no. Can't imagine storing it. Can't imagine querying it. Just, they're wrong.
similarly I don't think we want to put like instance information in Klaus so yeah I think we can agree that these are these are very different problems oh and one thing I wanted to when we were talking about just like using peg and so on and these other various rust crates um
And you're going to bring up Antler. Yeah, absolutely.
I'm going to bring up Antler. That's exactly where I'm going. I'm going to Antler. So where are you? I mean, you're an Antler lover, just to be clear.
Antler lover when I was doing stuff in Java. That's been a minute, but thanks for exposing me. Thanks for outing me as a former Java expat or whatever. I knew you loved Antler more than Java. Oh, 100%. Yeah, no, I think Antler is terrific. Never loved Java, but did love Antler. But Ben, didn't we use PEG? Or maybe was it PEST in the USDT stuff?
That's right. We used a different crate in Rust, also based on the parsing expression grammar formalism called PEST, to parse Dtrace, like a .d file that you would use when we built the USDT crate. And the My experiences with that actually led me to choose something different. I think it's really useful. Yeah, I loved it so much. Right.
Brian is brilliant, comma. Uh-oh, here we go.
I think that's a different design center. In my experience, it was awkward to work with the AST that it generated. And one of the features that... tag offered was the ability to parse directly into an AST that you want to work with.
So basically the idea is that Pest has a separate file that describes your grammar and then you run a build.rs step or an equivalent pre-compilation step that turns that into some Rust code that will chew through tokens and spit out
a type a rust type that you can operate on but it has a generic rule type which is basically like the string that matched that and it gives you the information about the rule that it matched and all of that sort of stuff and i do think it was very useful for the the dtrace thing because it's very easy to use for these small grammars it's pretty it's pretty fast um i really just wanted to like match a few kind of strings in that case in this case where i wanted to do things like parse into a full ast tree you know full tree um of like an a num type and rust uh
PEG offered a number of really good advantages, basically, that you can write it directly in Rust, meant that you can do things like write the code, the Rust code that processes the string that matches your rule is written right next to the rule itself. And in PEG, or sorry, in Pest, those two things are separate.
You have the file, the grammar file is written somewhere else, and then you've got to, you know, process the rule yourself separately.
So, Brian, I'm glad you brought up Antler because I feel like Antler was very domain-specific to Java. And to bring us back to the beginning, it seems like these other systems are much more native for Rust. I mean, for example, there is an Antler generation for Rust. And I haven't kept tabs on the state of it.
But it makes sense to have a domain-specific language for this kind of activity specific to the language that you want to use to augment that generation.
Yeah, interesting. Yeah. Interesting. Yeah. For the same reasons. For the same reasons. Yeah.
For like, you don't want to have the least common denominator. Turns out domain-specific languages can be valid. I'm sure there are lots of them that shouldn't have been written or whatever. But as long as you're looking at the available options and considering the aspects of the domain.
I think that DSLs are... When someone is developing a DSL, it is almost always coming out of exhausting the alternatives. I think.
That might be right.
Because the bar is high enough where you're not just going to frivolously dive into it. I think that's fair.
Yeah. In this particular case with OXQL, the alternative is writing...
a bunch of very mechanical, very verbose, error-prone SQL against ClickHouse. And we can auto-generate it for you. I mean, it doesn't need to be manual. And making every person pay the cost of basically reconstructing that denormalized table is crazy. I mean, it drove me nuts where I was basically, I was trying to select the raw data. To your point, Brian, it was painful.
I was trying to select the raw data and I just wanted to get something out of it. And I mean, you know, basically immediately when I started writing the data model RFT161, I think I even included in there like some points about, hey, there's this snippet that appears all the time. Maybe we should get a way to generate this.
And that's basically what OXQL was for, is the idea that I can write some high-level thing and it'll do the drudgery for me, which, you know, nobody wants to do, let alone we want to force our customers to do. That would be terrible.
Totally. And, you know, I think that we... I've always found that these little languages... we've never developed them superfluously, I think. In fact, I think to the contrary, sometimes we think we got something, we want to use something general purpose.
We try to make something general purpose work and you realize this is actually creating more drag than it's solving a problem for all the reasons you mentioned at the top. Actually, a former colleague of ours, Mike Shapiro, wrote a uh, ACMQ paper years ago on, on purpose-built languages, uh, featuring MDB, um, and are talking about ADB and, and the language in ADB. So, um, um,
Okay, I'm gathering from the chat that apparently there are some DSLs out there that feel more elective. I realized that now I have turned into like, people are like, you've gone from an okay, fine, like, oh, XQL, we reluctantly acknowledge its right to exist. But now you're just like a DSL apologist? Now it's like any DSL. You're a DSL maximalist.
There's a couple of comments which I think are... are very valid. I still have a fear that basically OXQL is not worth it, if I'm being honest.
And I think so far it seems to be very useful, but I agree that I was extremely resistant and it took me a long time to build it because of exactly what the third comment above mentioned, that there are many DSLs that have just been thrown in the bin because they seem cool, but they're really... Like, why not sequel? Right. You know, there's or why not something else? Anything else?
Pick your alternative. Right. And I do. I basically have had this fear from the beginning, and I think it it it, you know, has has been there for a while. And so I do think that it's it's it's a very valid concern. And I think ultimately the the. So far, it seems like we've been justified, but I do think it's a reasonable concern.
And then somebody else mentioned it's true that we're a little bit conflating why not SQL with why not SQL for the ClickHouse. And I think that's true, but that gets back to the earlier question or the earlier bit we were talking about, which is if we were to support SQL, it would already be a very tiny subset of it. Yeah. And it's not clear to me.
We would need to do almost all of the work of building our own language anyway. because I need to do something to compile that into the SQL against the tables that we have, and that's fine. But then I have to do something to interpret that SQL, figure out which subset of it we're going to support, deal with all of the obtuse, frankly, syntax that SQL comes with.
Basically, like I said, throw away 95% of the language and only support this little tiny subset. And I'm still doing most of the work, and it's not really clear to me how easy it would be at the time. It was not clear to me how easy it would be to build new things on top of it, to support new operators. And so I think... I tried that.
It does work, but it seems better to me to use something that's more tailored to our data model that we're better able to make incremental changes to. And I think, you know, it's a good question, but I think ultimately there are sort of two separate things. You're right. Why not SQL? And then why not SQL on the ClickHouse data? We are using the latter.
We are ultimately running SQL queries against ClickHouse. But the model that we expose, the language that we expose at the front end is something that's more tailored to our use cases.
Yeah, which I think actually gives us terrific power. I mean, I think that that's a very important layer of abstraction that we've injected. I also think, and I know you mentioned this at the top, but in terms of why not SQL, you also don't want to give people the impression like, oh, this is great, it's SQL, I know that.
It's like, no, no, no, sorry, did you miss the 16 asterisks that are after SQL? It's like, this is actually not just SQL, sorry. Okay.
It's a really good point. It's not a table. I could pretend it's a table, but what it would really be is one row where the last column is a giant array of the time points and the data, and it's not very useful at that point.
You don't get any of the benefit of a table format when you do that, or you have to replicate those fields to denormalize the data, and then you pay this massive cost for doing that. So it's not really obvious to me that that's the right model for the data ultimately. Some folks did mention Data Fusion, which I think is really cool.
So Data Fusion, for those of you who are not familiar, is a project for kind of giving you the pieces to build a database engine. It's got things like a SQL, actually SQL parser RS is a part of their project. So it's probably the most common. We use it actually internally for parsing SQL queries and writing SQL queries programmatically.
The SQL parser is from Data Fusion, the Data Fusion project, but they have this idea of reasonable database components, things like query planners, logical plans, physical plans. I would say it's heavily SQL focused, right? So it's very much, you know, hey, you want to build a new database engine that has even potentially a SQL or a custom query language on the front end.
It does ultimately really hew you into a SQL-like, table-like model of the system. And again, I just don't think we have that data format. It's not obvious to me that we get a lot from that.
The other big thing I should say is that I've definitely read the code, and it's very good for things like the query planner is pretty cool and has a lot of ideas for how to build a query planner, which I'm doing now, an optimizer now to make better use of our SQL queries that we are running against the database.
But it also really is, I think, outside of that, focused on the other Apache data formats like Arrow, which is very useful, again, but you have to have an Arrow file already accessible, which we don't, right? I mean, we can. You can ask ClickHouse to give you... an arrow formatted data, and it'll do that for you. It doesn't seem to buy us a whole lot.
If you already have arrow files, it's definitely something to look at for doing that. But we don't have that. It is very tightly coupled, as someone said. It is very tightly coupled to arrow. It's basically when you generate a physical plan, you're already using the schema types that they have for manipulating the arrow schema themselves. It's basically a wrapper around the arrow
a bunch of Arrow crates, which is, again, is very good. But there's actually just a paper that I saw about why Arrow might not actually give as much as we need. And I think it's not really built for modern hardware, is the argument that this paper made. I'll try to find that. I think it actually came from Andy Pavlo. But the idea is that the format
has a lot of indirection, and so you can't do what ClickHouse spends all its time doing, which is keeping the caches full. And basically, the memory hierarchy is full all the way up, and they don't have things like, well, I got to wait because this cache needs to get dumped so I can go fetch a totally different section of memory, which something like Arrow often can lead you down that path.
Yeah. Totally. Well, and I mean, I think that I know that we've gone our own way on lots of different things. Again, I think it's kind of, it's still surprising to me that the query language is like the bridge too far for the internet. The mob has shown up with the kind of the pitchforks and torches here. But the, I mean, we also did our own P4 compiler. We're going to talk about that. We did.
I mean, This isn't even our own... We've done other compilers. We've done multiple operating systems.
Didn't the networking folks get up in arms? I'm asking, did they get up in arms when you said you were doing their switch? Because it seems like the people who focus on databases or query languages are mad about OSQL, which... It's fair if you haven't sort of looked in the background. But did people feel the same way about doing any of the other choices that we've made that seemed crazy? No BIOS?
Oh, yeah, sure.
Yeah, I mean, yeah.
Obviously the database people are like, I don't care. Yeah. Do your own switch. It's fine. It's like, no, no, you're sorry. You don't understand how like that's madness. Like that actually is. Yeah. But the query language, that's just, that's great. Right. Right. So I think, but I would say that just in general, when we make one of these decisions, it's because we have exhausted the alternatives.
Like we have not done on any of our own Silicon people. Like just, just, you know, just clip that one.
Hold on to that for a few years. We'll see.
We certainly, there is all, there are plenty of folks at Oxide who are, who exactly, everyone's like, we haven't done our own silicon yet. Right. And there is plenty, you know, we haven't written our own database yet. Yet. We have not written our own database yet.
Don't you think that feels more... But we also have already announced that we're supporting our own database with respect to Cockroach. That's right. Yeah, look. But we are not doing our own silicon. And that's basically it. We have that in our own instruction set architecture. I mean, I don't know. There are things we have. But we are doing these things on every one of these decisions.
It is almost certainly the case that we actually went into, we assumed that we were going to use, you know, we assumed we were going to use talk before we came to Hubris. We assumed that we were, and I think for all of these things, we certainly were using Intel's tooling with respect to P4 before we kind of came to the conclusion we need to do our own P4 compiler. I mean, for all of these things,
We went in wanting to use something else and then realizing this doesn't fit exactly. And I think, Ben, as you mentioned, we have great apprehension when we go our own way. I know it doesn't feel like it, honestly, that you people have any. Are you sure you have any apprehension? It doesn't really feel like it.
For people that have apprehension about going your own way, you sure go your own way a lot. It's like, well, yes, I know. I know. I get it. It's a bad look. But we do go our own way. We do have apprehension about it. We do really carefully deliberate on this stuff. And this was, I think, to me, this is a very clear example where going our own way is the right decision.
Ben, I do not have your apprehensions about OXQL.
I'm glad one of us is that way. I mean, I think, you know, going with your examples that you gave before, you know, we did, I think, go in assuming PromQL would be would be what we use, right? That we would use something like Prometheus because it's obviously been, you know, quite successful.
And I think past experience and building the tool on top of it that we needed, you know, became, it became kind of clear through all of that and through all the writing that we did around the background that it's just, it was better to do our own, our own system, as you said.
I am. I'm also glad that someone has mentioned the AT&T Hobbit in the chat. I am less than like 50 feet away from an AT&T Hobbit manual here in the Oxford office. I think to point out just to our, our, our, there's our return to the office conversation. I like to return to the office just to be close to my AT&T Hobbit manual, but we are, um, Well, Ben, this has been great. Thank you very much.
In terms of comparing this to your thesis defense, do you feel the questions were, how did these two compare? You're just like, hey, next time I would do something less stressful, like get a PhD.
Oh, yeah. This was cordial. Very cordial. That's actually a relief. PhD has this like weird thing where you just sort of go out of the room for a couple hours or an hour and they talk about you and then you come back and it's like, so when I came back in.
I'm glad you brought this up because we now would like you to leave. And now we're going to invite everyone up on stage and we're going to discuss whether the novelty of OXQL and whether you should have just used SQL.
When I came back, when I came back from my thesis defense, I walked out of the room and I came back and one of the people on my committee just started sort of diving into a question. He said, hey, so, you know, I was thinking about this thing and yada, yada, yada. And he went about like two minutes before he was like, oh, you passed, by the way.
Oh, OK. Oh, yeah. Thanks. Thanks for bearing the lead there. I mean, I was curious. Right. Yeah. Oh, my God. I got to say, like, if you have news, big news, bad news, good news, do not bury the lead. Just this is like life lesson. Just get that news out there like early.
You know, like what's going to call from the from the disciplinary end of the high school, the assistant principal for discipline. And clearly this person has dealt with a lot of people because the first thing she said, I'm like, oh, my God, it's a disciplinary from the high school. Like, oh, like. Do I need a lawyer, basically, is my first thought. It's just like, it's good news.
Like, okay, it's good. That was nice of them. Oh, very nice of them. Very nice of them. And also, like, I mean, the whole thing was nice. It was very nice that they called with good news. Which is good. It's not always good news. So, you know, if it had been bad news, like, let's lead with that. Anyway, so there you go. Well, Ben, I'd like to lead you, lead with the good news.
I think OXQL is awesome. And I'm really glad you...
Thank you.
And I'm a DSL maximalist, so there, everybody.
I mean, it's great to have these forums because I think... You know, I can write about it, but... I mean, reading back on it, it's still sort of easy to look at it and be like, well, that makes sense to me. But, you know, of course it does. I wrote it. I've been like stewing in it for years, right?
So it's nice to have a forum in which you can field the questions that people actually have rather than sort of try to infer what they would be and answer them ahead of time, right? So I think it's a really useful format. Anyway, thanks for having me. Thanks for all the questions. It was really valuable.
Yeah. And we test ran the new AHL bot. Adam has been completely replaced with an AI. That seemed plausible. I don't know. I bought it this whole time. Yeah, exactly.
That technology has been within our hands for years.
That's right. Just somebody mentioned Antler, it'll immediately start generating. Exactly. And, uh, and, and check out the, uh, we, we dropped the link in the chat to the auto podcast. That was very creepy. They were talking about at the top. It's kind of fun to check.
It's very, it's bizarre for reasons I can't quite explain. It's yeah.
It's creepy. For, uh, well, Ben, thanks again for Oxquill. Um, and I'm going to go take it so I can go to bed. All right. Stay curious, everybody. Yeah. Thank you. Thanks everyone. Talk to you next time. Bye.