Flavors of Ship It on The Changelog — if you're not subscribed to Ship It yet, do so at shipit.show or by searching for "Ship it" wherever you listen to podcasts. Every week Justin Garrison and Autumn Nash explore everything that happens after `git push` — and today's flavors include running infrastructure in space, managing millions of machines at Meta, and what it takes to control your 3D printer with OctoPrint.
What's up? Welcome back. This is The Change Log. We feature the hackers, the leaders, and those who are shipping all that awesome software out there. And speaking of shipping, Jared and I are taking the week off and bringing you various flavors of Ship It to enjoy. Yes, we have a podcast called Ship It.
If you're not a subscriber, you can do so at shipit.show or by searching for Ship It wherever you listen to your podcasts. Every week, Justin Garrison and Autumn Nash explore everything that happens after Git Push. And today's flavors of Ship It include running infrastructure in space, managing millions of machines at Meta, and what it takes to control your 3D printer with OctoPrint.
you
What's up, friends? I'm here with a new friend I made over at Speakeasy. Founding engineer, George Hadar. Speakeasy is the complete platform for great API developer experience. They help you produce SDKs, Terraform providers, docs, and more. George, take me on a journey through this process.
Help me understand exactly what it takes to generate an SDK for an API at the quality level required for good user experience, good dev experience.
The reality is the larger your API becomes, the more you'll want to support users that want to use your API. And to do that, your instinct will be to ship a library, a package, and what we've been calling an SDK. There's a lot of effort involved in taking an API that lives in the world and creating a piece of software that can talk to that API.
Building SDKs by hand is a significant investment and a lot of large companies might pour a lot of money into that effort to create something that's like approaches good developer experience. And then another group of a more growing group of companies will rely on tooling like code generators.
And so they're very interested in, like, once you make the decision to use a code generator, you're kind of forfeiting some of your own opinions and what you think a good developer experience is because you're going to delegate that to a code generator to give you an SDK that you think users will enjoy using.
Okay. Go to speakeasy.com. Build APIs your users love. Robust SDKs. Enterprise-grade APIs. Crafted in minutes. Go to speakeasy.com. Once again, speakeasy.com.
All right, thank you so much, Andrew Gunther, for being on the show today. And today we're talking all about shipping in space. And so welcome to the show. And my first question is, when you have some code that's running in space on a rocket ship, and if it's a class that maybe is undeclared, is that an unidentified flying object?
Ooh. Sorry. Oh, that's a boo. That's a boo for me, dog.
We don't even have all the context here. What do you do in space? I have so many questions. I know.
I literally was up last night and I thought of that. Like I woke up and I was like, that's the joke. And I'm like, oh, I'm such a dad. Anyway.
He didn't wake up with a line of code. He woke up with a dad joke about space. Like I love it.
I get that though. We all go through that phase.
He's like, I feel you. It's okay.
So anyway, bad jokes aside, Andrew, tell us about yourself and what you were doing at Orbital Sidekick.
Yeah, for sure. So I'm Andrew Gunther. I work for a company called Orbital Sidekick. So Orbital Sidekick operates a constellation of hyperspectral imaging satellites. And basically what that means is they have these cameras that can see way outside of the visible spectrum of light so they can effectively perform spectroscopy from space.
So gases that would normally be invisible to the naked eye are things that their cameras can see. And so their primary market right now is customers like oil and gas. We're like, hey, let us know if our pipelines are leaking. So OSK basically processes their own imagery, determines where leak sites are and forwards those on to customers.
They have customers in government who buy raw imagery looking to expand out into other industries. As you can imagine, like with these kinds of cameras, there's all kinds of cool stuff you can do. You can monitor plant health. You can help with mining prospecting. So very, very cool technology still in the early stages. Three satellites in orbit right now. to more launching in March-ish.
I don't have exact dates yet, but a little bit more about me. So I am principal software engineer at Orbital Sidekick. Prior to that, I worked at AWS for seven years. So basically I left AWS, joined OSK as lucky employee number 13, and got to build a lot of their ground segment systems from the ground up. As time went on, I got to be a little bit more involved and help out on payload side as well.
So kind of the V1 of all of OSK systems, I got to sort of touch and then moved into this role of wherever the fires are, I moved around to put those out.
And hopefully not physical fires. I mean, these are spaceships and rockets.
No, thankfully no, no physical fires.
Okay. Like we just became best friends. Like I'm so excited right now. Like I'm trying to like parse the like amount of questions in my brain because that's how excited I am. Like, okay, like. So you're building the software that processes the images, but also is it like, are you building the software that is on the satellites?
So it's interesting. OSK is a company of about 30 people.
Do you need 31?
Because like... Applications are open.
Okay, cool.
It's interesting. So being that small, we have to work with a lot of vendors to sort of pull things together. But the payload design and a lot of the core software for image processing, we write ourselves.
What language do you write image processing?
So the image processing on ground is all in Python. And the firmware for the SAT is C++ as well with some Python mixed in. So one of the big value props for OSK is that we try and perform some of the imagery analysis on board the satellite before it even comes to the ground.
What?
Yeah, because you have this incredibly wide spectrum imagery, the data is huge. I mean, we're talking these satellites can bring down one and a half terabytes of imagery per day, per satellite. And so part of the idea is the more processing we can do on board to understand what imagery might be a priority versus not a priority really helps us get that information to our customers faster.
So there's also this aspect of the analysis we write on ground should be analysis that we can hopefully perform on board as well.
That is so cool.
We ship NVIDIA hardware up to space. We are running NVIDIA dev board in low Earth orbit.
I'm just thinking of Nvidia drivers now. And I'm like, oh, like that's the worst, like trying to like on a Linux embedded system. Well, how are you?
A Linux embedded system that you never get to touch again, right? It goes up and you're, you're locked in.
If something goes wrong, like how do you fix it when it's like in orbit?
Yeah. I mean, this really gets into redundant systems, right? So a lot of the components on board, there's at least two, our own components. So our own dev board.
I forget if there's one or two of those, but there's kind of like a main control computer that exists separate from ours that kind of handles a lot of the boring stuff, like pointing the satellite and doing the actual hard work of transmitting data back down to the ground. And then our dev board basically handles all of that image processing, sending commands to the camera.
So effectively, we have capabilities to like fail over from one component to the other. Or if we're rolling out an upgrade, you know, we roll it out to XCOM 2 and then we primary swap to XCOM 1. And so it's almost like an A-B test in space, right? You are kind of like a canary. So you upload it to one of the XCOMs, you swap over to that, make sure everything still works. Great.
Roll it up to the second XCOM. Everything still works. Great.
I feel like you have to write really good code because and you have to really think about your hardware because you never get to touch it again, you know, and you could miss a picture.
A hundred percent. And this is one of the crazy things. I mean, even in a startup like an aerospace startup, the dev cycle on hardware is super long. So, you know, a lot of the hardware was designed and locked in and figured out before a lot of people got hired. Like the hardware was decided before I even got hired.
Which is crazy. You said you were number 13, right?
Yeah.
That's like really early.
Yeah. So there's a lot of, you know, by the time these things go up, like, you know, you've got three new generations of NVIDIA dev boards that have come out. You're running like Ubuntu 1804 in space for the next, you know, half decade.
Yeah. Long-term support has a different meaning when it's flying around the world. Yeah.
It's like L-L-L-L-T-S. Long, long, long-term support. Yeah.
Which, like, well, it's funny because people, like, kind of make jokes about, like, NASA in different places and how they use, like, in the government and how they use outdated technologies. But when you really think about it in context, there's a reason why they're still using that very reliable technology because, hey, you can't go change it every year.
Yeah, because you got to test the hell out of it before it goes up. And it's so interesting to kind of see this boon of aerospace startups. Like, before I came to OSK, like, I didn't work in aerospace. Like I said, I was at AWS. And I also had the draw of like, I want to work on space stuff. Like that sounds awesome.
And seeing this smashing of startup culture and aerospace, like you have this culture that wants to move incredibly fast and this culture that's traditionally very slow, trying to like figure out like where, where does this all meet in the middle? How do we speed this process up, become more agile? And that was to me like one of the most interesting things to observe.
First of all, that's really cool when I think about like 30 people, three satellites, right? Like 10 people per satellite in space, you're going to have a couple more. And like, that's like the opposite scaling of what I think of for like running systems where it's like one sysadmin can do a hundred machines. It's like, but like a spaceship, like it's literally a satellite.
It takes so much time and process. What does that actually look like for you of... We're going to make another, you know, satellites. It's going to go out next year. Like how, what is that lead time for like your, what you're writing today? What, what decisions you're making about around libraries and code? And then like, how do you get feedback for that?
How do you make sure that like that thing that you think is going to be accurate next year gives you any kind of feedback loop?
Oh man, there's so many great questions to unpack here. So I'll try and go at it one at a time. So one of the saving graces to some degree is that as we launch more satellites, they're all based on the same hardware designs, like very minor, minor revisions between them, right? Like you have a satellite that works, like don't mess with it. Continue to launch more of the same.
So but then also on the flip side, right, when the first one goes up and we realize like, ah, we really should have done some things differently as we learn the iteration cycle is even slower. So there's a lot of things that we have to kind of deal with on ground and we're making notes for what the next gen hardware is going to look like. additional concerns.
And when you talk about, you know, what kind of packages are we going to use? That's a huge concern of ours, right? Again, it's running 1804 in space. We're trying to do machine learning and data analysis. A lot of those libraries move very fast. They're very quick to drop support for older operating systems. So, you know, we have to make the call as a small team.
Are we going to compile these ourselves? Like, are we going to build our own versions of these dependencies to maintain them? So we're very cognizant, especially on the onboard data processing side of what libraries we pull in. I mean, more so than anywhere I've ever been, because not only is maintenance a concern, size is a huge concern. Pushing software updates to space is hard, right?
It takes a while. You're going to test the hell out of it and you want to make sure that it works. And so I like to pick on Node.js because you have like the NPM package system, right? Where just everything sprawls out to infinity. You install.
They're ridiculous. 65 warnings. Just like what?
Yeah, you install one thing and suddenly now you've got like 100 gigabytes of dependencies. And even in Python, right, we have to be really careful of that. Like, what does our dependency sprawl look like? And we've made conscious decisions to say, you know, that sprawls out a little too much. Like, we're not going to use it.
And something we really try to hold to our own frustration sometimes is parity between space processing and ground processing. So there's a library where it's like, all right, well, we don't want to ship it to space. Are we going to use it on the ground? I don't know. Maybe now we've kind of separated these paths and it makes it harder for us to verify results between the two.
So those are the kinds of things that we have to think about. It's interesting that to some degree, even space decisions can slow down ground decisions to some degree.
It's really interesting because you're kind of developing in like the paradox of like all developer like angst, I guess, or problems like, okay, so you want to build something with low dependencies, something that's not going to be vulnerable, something that's going to last for a long time. But then how do you pick that? You have no control over software life cycles of other people.
And you have to know like three years in advance.
That's what I'm saying. And like, how do you account for vulnerabilities? Like you're going to have to patch things eventually. So like, how do you patch in space? There's going to be a CVE for something.
Yeah, I don't want to fall into the trap of it's in space, so it's safe, right?
The attack surface is a little different than what a typical server might have.
Yeah, it's a very different attack surface, but I will kind of pull that card a little bit, at least as far as the Python side of things goes. That system is very isolated. Like it's not, we're not running a web server up in space, right? But I will, you know, to the point of security and CVEs happening in space, Space Force is actually making a huge push. They were at DEF CON last year.
I got to go and watch the, they held the Hack-A-Sat hackathon, which was very cool. They actually, the Space Force launched a satellite for a hackathon.
I still can't believe Space Force is like real. Like every time someone says it, it makes me happy.
It's pretty amazing. Yeah.
Like, that sounds just so cool. Like, I work at Space Force. Like, no big deal. Like, right?
Central Space Command, you know?
Like, it's awesome.
Yeah, so it's important to not fall into the trap of, like, we're in space, so we're safe, right? And especially in that startup culture of, like, wanting to move really fast and compete with these bigger guys, that's something that we're very cognizant of and trying to find those right balances, right?
How do you make decisions and what kind of tenants do you have to, I guess, develop? Because you both want to develop quickly because everybody wants to innovate and develop quickly. And that's how you get an edge on your market. But also, how do you make that last for so long? And then how do you do it with... I was writing an automation script and we were trying to get rid of dependencies.
So it's like, okay, I won't use Panda. I'll use Python, the things that come with Python, right? So like... Trying to develop on that level on just a small automation script made it so much more complicated. So I can only imagine image processing.
it's a big push pull because you definitely want to try and keep your space systems as simple as possible. And we're very much breaking that mold by saying like, we're going to do imagery analysis on board, on board of a satellite. And so it's, it's definitely something that we're constant of. And we have this nicety that we can test a lot of things out in the ground segment.
We can use those libraries on ground initially before we make the call of, you know, this is something that we want to run in space. So let's retrofit. We can use all those nice libraries, have 100 gigabytes of dependencies to prove out those analyses on ground.
And then when we want to say, okay, this is high value, we want to run it on board, we can take that step to say, all right, let's strip this back. Let's make this bare bones. How do we leverage what's already on board to now ship this thing up to space?
Networking backwards.
Yeah, yep.
Fundamentally, a customer comes to you and says, I need you to look at something on the ground. You're not running customer code in space, right? They're giving you a job to say, like, coordinates, please send me this data, right?
Yep, exactly.
You're going to go... point the iPhone 25 at this ground spots and like get this image back. It's going to transmit down. You see like a terabyte a day out of this. Like you're just taking pictures constantly. And then you process that a little bit more and then send them either raw data or whatever it is that they're looking for. Right. Like that's the general pipeline here.
Yep. You got it.
What was the benefit there of not making the satellite a dumb client and putting intelligence in the satellite and on the ground? Cause it feels like you could put that either place and you, you chose the hardest decision to put it in both places. Like there has to be value that you're getting out of doing that process before.
I mean, if you're just sending a terabyte, like, I don't know, was that just a big antenna with a satellite dish up there? Just like beaming down death rays of pictures. Yeah.
So that antenna that actually performs that one terabyte a day downlink, we get a pass on that antenna every 90 minutes. So if you have a really critical high priority workload and your objective is to deliver some insight to a customer as quickly as possible, you might not want to wait that 90 minutes.
even after that 90 minutes is up it has to get to the ground it has to be processed then on the ground before it finally gets delivered so the idea being if we a can prioritize because we're also not able to get down all the data we may have on board every pass so it's kind of twofold right if you can say on board hey i have very high confidence that there is a methane leak at this position
That is a much smaller piece of data. And there are other antennas that we can use to transmit that data more instantaneously. And then secondly, maybe you have a little bit less confidence, but something is suspect. You can say, all right, on our next pass, this data skips the line. We're going to downlink that first.
And we're going to make sure that gets analyzed as part of this pass so we can get that information out as quickly as possible to customers. But it is definitely the hardest of all the options. You're right.
So kind of like how they're using machine learning to look through ultrasounds, but you're using it to like basically prioritize data from satellites to bring that down first.
Yep.
It's really cool.
How do you debug that? Like, is that only like you have a dev box on your desk and you're saying like, oh, I think this is what's happening, right? Like at some point when you debug something, you just have to kind of poke at it. But I can't imagine like that latency, you have a 90 minute window. I don't know how long that window lasts, but you're like, oh, I got a shell for 89 seconds.
Like I got to jump on the box and like poke at something.
Yeah, you can SSH into space for a hot five minutes and take a look around. There has to be some planning ahead of time. If you want to run some set of debug scripts, you're going to want to know ahead of time and just run that in an automated way rather than just maybe having a terminal open, which we've done.
We've done, especially after the Sats first went up and we were trying to better understand the characteristics of the first one and just get a sense of what was happening live. There was a lot of like, all right, time to SSH.
I can only imagine the like constant TMUX session. That's like, oh, it's coming back around. Let me connect to it again. Hold on. That's just amazing.
We don't have space for TMUX, man. It's a fresh shell every time.
Oh, and then your SSH hangs and you're like, dang.
Yeah, yeah.
That is exactly what I would hope to be.
Yeah, it's the dream.
There's so many challenges in this. What would you say is like one of the things that stood out to you as like something you didn't expect? Because I mean, you're going into this knowing this is going to be a hard thing to do. There's a lot of variables and things at play. What is something that surprised you about shipping software into Orbit that you're like, wow, I didn't see that one coming?
Yeah, I think, you know, and maybe I was uniquely naive in that, you know, I think everybody has that vision built up of like how NASA does things, right? And you imagine clean rooms and every like this perfection and just everything is immaculately tested. And I'd say that there's not problems, but it's you have that vision of that much slower pace. And I think what was surprising to me is that
the speed at which we can move and the amount of chaos that introduces and that it's okay. There was a lot of thought put in upfront around those failure modes and understanding and basically protecting ourselves, our future selves. So that when things do get chaotic and things do break, we have the levers that we can pull. So it's not clean rooms.
I mean, there are clean rooms, but you take a drill, do a test and it's like, oh man, we need to like route this connection somewhere else. And somebody just like takes a drill to a frame and they're like, all right, let's send it to space. That just kind of like shatters your view, right, of how kind of the way NASA does things.
And I think that kind of goes to what I was saying earlier about this meeting in the middle of this startup culture wanting to move fast and that entrenched aerospace culture of moving very, very slow. Right.
If we can launch a satellite for, say, like five million dollars, why are we going to run a five million dollar on ground test for that satellite or a ten million dollar on ground test for that satellite? We can launch three for that price. And if one of them works, we're great. So I think that's kind of where where that push and pull really comes into play.
And I was really surprising to me was just how much leniency there was towards moving fast. Like I didn't expect it to be able to move as fast as we've been able to move.
I wonder what the culture was like in the 60s. Right. When it's like we are landing on the moon because that like they had to move faster. Like my grandfather actually worked on the Apollo missions, which is just like his pictures were absolutely amazing. And it was I never got to hear stories from him of like what the culture was like.
But I can only imagine that like at some point you're just like, no, this has to happen this decade. Right. Yep.
someone's you know so someone said we're going to the moon and the fact that like so many people and so much funding and money was in place to do that and now on the opposite side where it's like no one told you to do this like no one told you this is the the thing we have to do and and so the initiatives are very different of like hey we see where we can add value to people that's
maybe had to drive their truck for two days to go see this pipeline or something like that. Like, Hey, I got you in an hour and a half and you're going to get your images and they're gonna be processed and we'll see all that stuff that maybe you couldn't see before. And that's just pretty amazing to be able to add that value that quickly.
Yeah, it's nuts. I mean, this is something that even 10 years ago wasn't possible, right? Like launches have become way cheaper. You know, $5 million is a lot, but in the grand scheme of like Silicon Valley VC money, that's not a lot. And it's become super accessible for startups to Launch payloads into space. It's high cap X still for sure, but it's possible when it really just wasn't before.
And I think to your point, we're seeing that transformation in a lot of industries. For oil and gas, the state of the art was like once a quarter, they would pay some kid trying to get their pilot's license to just like fly and look out the window of a Cessna. And like, do you see any leaks? Nope. Nope. right? That's what we're going up against. That's what we're replacing.
It just feels like such a huge quantum leap forward for that industry. And we're seeing like, we see that with customers, right? They're super excited. I mean, A, because it's space and it's cool, but also just, it is such a faster feedback loop than anything that they worked with before.
That's wild that you can do something in space that much quicker. Also, I think it's really going to add I don't even know if I'd say add value, but it really sets you apart if you can move fast and cheaper because of the market that we're in right now and less VC funding and higher interest rates.
That's awesome that you've been able to add so much value, but also like iterate faster and at a, I guess, smaller cost, you know, even if $5 million isn't anything to like.
It's still high CapEx, but lower than it used to be CapEx. Yeah.
Yeah. I mean, like back in the day, the only people that launched anything into space was NASA, you know, so the fact that it is even an industry that multiple people like or multiple companies can do, you know, is kind of just wild in itself.
So my dad also worked in aerospace. And when I told him that I was coming to his company, he was like, you mean you're just a bunch of guys and you put some satellites in space? Like, yeah. Yeah, they just let us. For real. You just apply for your FCC license and like let Noah know. And they're like, yeah, go for it.
Which is like wild. There's like no space license or something. Like, you know.
It's interesting because there kind of is. It's governed by the FCC because they control the radio waves. I got into a whole conversation with somebody on Hacker News a while ago about this because I just find this fascinating how the US finds really unique ways to have regulatory vectors over stuff like space. And the FCC is like the main body for that because they govern the airwaves.
So it's basically if you want to transmit within the U.S., you need an FCC license. And if you're launching a satellite, you probably are going to want to transmit in the U.S. So you need a license from the FCC to launch a satellite.
It's like wild because I think through crypto and NFCs, we've seen what happens when we don't have regulations. But then sometimes you're just like, where do these things come from? Like the FCC is not what I was guessed of like your space license. You know what I mean? Yeah. So that's like, who would have even thought?
But also at the same time, like, you know, when you're a kid and you think of space, you think of doing so much more to be able to launch something into space. And that's just wild that it's just like, check what the dudes do, airwaves, and then you can put whatever you want up there.
Yeah, and even better, the FCC issued their first fine for space junk a couple months ago.
Oh, that's cool.
The Federal Communications Commission, the champions of litter in space. Yeah.
But it's also interesting, though, because so many things get launched, right? And, like, even if it doesn't go wrong, there's just so much that doesn't go with your, like, rocket or, you know, whatever. Like, they're made to have parts that break off.
So, like, I don't, like, did people even think about, like, what we're going to do with all that at some point and that we're going to collect all that?
It was just, like, let it all burn up. A lot of these satellites, like, our satellites don't have propulsion. So, after X number of years, the orbit decays and they burn up and that's game. Yeah.
Do they really just burn up completely?
Mostly so.
What's the lifespan? We're talking like servers are like seven years, right? Like I can buy a server, put it in a rack and I hold it for seven years. You launch a spaceship and how, not a space, I don't know a technical term. It's obviously a satellite because there's no propulsion or anything. What is the lifespan of those first three satellites that you launch?
So there's a difference between orbit decay and mission life because the components on board, in theory, will go out long before the actual orbit will decay. So I believe the satellites are slated to be a five-year mission from the onboard component perspective. But this is like that's still kind of like NASA grade ratings, right? Ideally, you get way longer than five years.
And then I think the orbital decay is like 15 years, closer to 10, 15 years. It'll take before.
But do they completely dissolve? Because you know how the rovers, one will be like, it'll live way longer than they're supposed to, and then one gets too much dust, and then the solar plates can't keep powering it. Also, I cried. I was so in my feelings about the rover. I was like, no, but I love your pictures.
I know. It was so lonely, and I was so sad. I was really injured. I was so sad. My kids are like, what's wrong? And I was like, the rovers.
You build up feelings around these things. It's funny because each of our satellites we name after a sidekick. So the official designations are like Ghost 1, Ghost 2, Ghost 3, but we call them Robin, Goose, and Chewy.
Oh my God, what if Goose dies?
Yeah, like Goose is ill-fated.
Poor Goose.
Are you trying to make us cry, Andrew?
Like, if you had to name it Goose, like... That's definitely not the objective. This is just a plug for that we have a Slack bot that announces, like, telemetry and new imagery, and it uses a picture of the appropriate sidekick and, like, speaks as if, like, Goose checking in, got new imagery.
You have a Slack bot with pictures, and it... Andrew, hire me.
You get paid to do this? I literally stalk all of James Webber and all the different satellites and post them like a crazy person. We're best friends now. It's happening.
Like Starlink, do you have satellite-to-satellite communication to help with that 90-minute delay? And will five satellites reduce that for you significantly?
Yeah, so we have a radio for satellite-to-satellite communication. They're not enabled yet, but feasibly, yes. The more satellites you have, the better network you have, and you can communicate in between. There's also proposals going through for a larger network. So we could, over encrypted communication, talk to satellites that aren't ours and everybody working together to get data down faster.
would that be like the outro net?
I don't know what that you call that instead of the inner, like this is, you guys have to name it something cool after space force. You can't just, and like you named a satellite goose, like the bar is high. Like, yeah.
Inter-satellite communication networks are definitely something that's up and coming and trying to get off the ground. It's a bad joke.
I got the pun.
I was late on that, but they, are you about to dad joke us? Like,
Who's the jerks in space? Is Starlink, is that, they just litter everywhere and we can't see around them? Is there some other, these people don't have another country that's like, well, they're not working with FCC, so they just threw. If you can't answer it, that's fine, but I'm curious now, is there space beef between satellite vendors now?
You're going to get Andrew in trouble.
I mean, Low Earth Orbit is smaller than you think. And I would say, without naming names, the jerks are the people who are just launching tons and tons, which is pretty much anybody who's looking to offer satellite-based internet, right? Satellite-based internet takes an absurd number of satellites. It's easy to pick on Starlink because they were the first, but they're not the only.
And that's going to continue to crowd Low Earth Orbit, which, again, is the most... accessible orbit for people like us. You can't understand an orbit out with accuracy multiple years. These things, they're going to collide at some point. There will be collisions and there have been close calls.
What's crazy is we actually got a call for one of our sats and they were like, hey, you're going to pass really close to a Starlink satellite. Just heads up.
How close is really close? Like, I think a space is like is like you're hundreds of miles away. But no, this is really close is probably close enough that somebody called, right?
Can you change like the course of direction if you're going to get too close or are you just kind of like it's just out of luck and you're going to hit each other?
Us, no. Starlink has some rudimentary propulsion so they can do some stuff. I mean, even the space station had to move to dodge. I think there was like a Chinese satellite where they were like, hey, there's a satellite that we have and you need to move the space station so that our satellite doesn't hit it.
Just move a whole space station. No big deal.
Yeah, if you could just like shift altitude control a little bit and like, yeah, just real fast.
But what's the heads up for that? Is that like you have 10 orbits and then you're done? Or is this like, hey, like 90 minutes?
How long does it take to move a space station? Like, this is wild.
I don't know the answer on the space station, but for ours, it was just this like tense hour and a half. Right. Because we get that telemetry down and then it's like, all right, this is the orbit. And so we're sitting there waiting and hope Starlink moves. Yeah. And then like 90 minutes later, we get that ping and we're like, oh, thank God.
I can just imagine you're like wiggling the camera, like trying to focus back and forth to like get out of the way. Like maybe we can move something.
That goes back to like it is a lower cap of like $5 million, but still that would really suck if somebody just runs into your $5 million space like satellite.
Yeah, just game over, right?
Yeah, the work that you – like I mean I can't imagine how much work it takes to get them into space and then like the cost and then someone just runs into it really quick. My bad, like –
Especially for us when it's like one of three, right? That's a 33% reduction in our total capacity, which is like super meaningful to the business. Each of these satellites matters for us. I do have, in terms of the bullies in space, I do have one other very funny anecdote because I have beef with the Vatican.
Hold on.
Wow. That is a powerful person to beef with.
Like, I'm here for it.
The Vatican has a space program. Fun fact.
What?
The Vatican? Okay.
The Vatican has a space program. You can read all about it. It's called Spy Satellites, funny enough, but it's S-P-E-I. It's Italian. It's Italian. Cut him some slack, but the humor of it is not lost on me. So they actually launched with, they launched on the same rocket as one of ours.
And so one of the processes you have to go through when you launch a satellite is you basically call up NORAD and you're like, hey, this unidentified object you're tracking in space, that's ours. They know like who's who, what's what. Well, they don't know who's who. You've got to tell them.
Do they give you Santa's number when you call though?
Yes.
But every time they're like, the call options have changed. Yeah, yeah, yeah.
The call options have changed. Press one for Santa, two for satellites. And so when we launched, the Vatican called NORAD and claimed our satellite incorrectly.
No, you got scalped by the Vatican.
We got scalped by the Pope, man. We got scalped by the Pope.
It's the greatest meme of all time internally of just like... Dude, when you're like a great grandpa, you should be like, there's one time I worked in space and then the Pope tried to steal my satellite.
Like, do you have baller like work stories?
Man, coming from the outside, like the conversation I had with our main space systems guy of like, how do we like, how do we get our satellite back from the Vatican? He's like, it's just a naming thing. Like, it's not a big deal. I was like, no, tell me it's a big deal. I want to believe this is a huge deal.
You just wanted to start a fight with the Pope, didn't you? You were like, send me to Italy. We will have this out.
100%, yeah. We got beef.
So your satellite is forever. Norad always thinks your satellite is now.
No, so we managed to correct this clerical error, and we've properly identified.
Clerical, that was good. That was a pun. Yeah.
I'm far enough into daddom that they just roll out, and I don't even think about it anymore.
I was going to say, are you a dad, Andrew? I am. Dude, can we talk? I went to go talk at my kid's school, and they're like, oh, cool, you're an engineer. But Andrew wins every time. My kids are like, oh, you build Java, and the only thing I can say is that Java builds Minecraft. That's the only cool thing. My kids don't care if I build Java.
But when you get to say, I work in space, you win coolest career day dad ever, every time.
I do appreciate that I have like a job that my kid kind of gets. I can be like satellites. And she's like, yes, space.
Like rocket space. That's a whole childhood. Like, you know how like they get into dinosaurs. They get into like space is like a whole thing. That's like a chapter like in childhood.
But then she brings home pink eye and I'm like, come on, man.
Dude, my kids brought home hand, foot and mouth.
I'm like I'm currently on drops for Pink Eye and it's just the worst oh my god why do they always get us sick like they're just like oh we love you so much and we're so cute please don't include this in the outro I just imagine the episode's gonna end with this like conversation on Pink Eye dude have you seen the meme where the like alien is like breathing in the like lady's face and it says when your kid's sick and they're like I love you and you're like yeah
Okay, wait, before we leave, what's the craziest thing you've had to fix in space?
Oh, this is a great one. So the craziest thing, so I mentioned that we have multiple radios on board, right? We have this like super high bandwidth one and it's one way. And that's where that one and a half terabytes down comes from. We have this kind of like satellite to satellite. We have a much slower one that's more for like command and control. Sort of deal. SSH.
Yeah, that's where all the SSH magic happens. And effectively, the way this is all supposed to work is imagine like TCP where your packets come down over this fast one and then we send the ax back up the slower connection. And we could not connect over that slower connection when we first launched.
And so you basically like ran into flow control where we would try and like downlink imagery and it would give up after like a few megabytes because it's like, oh, I'm not getting any ax. And so I got pulled into that and we basically had to, we pushed this like really small patch up to the spacecraft to basically like ignore acts, like pretend acts do not exist and just blast this data down.
Cause we're, I mean, yeah, we're a startup and we're trying to like, we've launched our satellites and investors and customers are waiting for like those first pictures. And we're trying to like as quickly as possible. Yeah, as quickly as possible to get these things down.
So we ended up pushing up this patch to basically ignore the acts and we ditched the file transfer client entirely on the ground. And we just started running packet captures. Like we just ran TCP dump on this thing and just started like built this catalog of like terabytes of,
of tcp dumps and then we we wrote a script that would basically analyze these and try to piece together files from the tcp dumps across multiple passes so like the same file would get transmitted like 10 times because you can imagine your packet loss from space is quite high so oh my god It was the most infuriating thing to watch because it's also this long tail.
Like we couldn't tell, like we didn't have the control of telesatellite. Like, oh, we only need these five remaining packets. It would just blast down the whole thing. So you would get like 50% on one pass. Then on the next pass, 75, then 90, then 95, then 99, then 99.9. And because these bundles are encrypted, you need the whole thing. Like you can't be like, ah, screw that last packet.
Like for encryption to work, like you need the whole thing. Yeah. And so we're, we, we like basically wrote this, um, we call them DJP cap.
Yeah. PCAP file, read it in and like parse the spinning, spinning those PCAPs.
Yeah. Spinning those PCAPs. So DJP cap was just trying as hard as it could to assemble from these TCP dumps. And that's how we got our first imagery. This issue has since been resolved, but the first imagery from our satellites was basically. rebuilt through this crazy kind of bespoke process.
And again, I think that kind of like goes towards the whole theme of the space segment moves much slower than we can move on the ground. So we're always trying to think of ways like, how can we deal with this on the ground? How can we fix this on the ground? And I think that's probably the most harrowing story out of all of them.
What a way to get started.
Yeah. Yeah. That was a stressful couple of weeks.
Oh man. It was for weeks. And then I just feel like images, like it's so easy to mess up, like into like mess up imagery. You know what I mean? Like you need high resolution to really be able to like do things. So like,
Well, and then the even better wrench that got thrown into this is that some of the packets would be corrupted. So we would try and reassemble and then we would de-dupe multiple of the same piece of imagery to make sure it was the same on multiple passes. So it was almost like you kind of needed two copies of an image to make sure... that it was all good.
The other thing I will say is a lot of this work we did at one of our vendor partners down in Sunnyvale. And I took back to our office. They had Baja Blast there. And it was the first time I had ever seen Mountain Dew Baja Blast in the wild. And when I tell you, we completed that first piece of imagery. And my coworker and I were just like, Baja Blast time.
And that is a core memory for me now is Baja Blast is success. Love it.
Andrew, thank you so much for coming on the show. This conversation has been a rocket of a ride. I had to get one in. It's all right. And I learned so much. Justin, you're killing me. Where, where can people find you to, to ask more questions or, I mean, I know orbital sidekick.com is the website for the company, but I know you're available or at least somewhat social online.
Where should people reach out and find you?
Yeah. So I'm code brood on Twitter. I also hang out in the ship at Slack. I'll give you guys that. Yeah. Great place to go. So check that out. I'd say those are probably the two best places to get ahold of me.
And by people, he means me. So we can be besties.
Oh, I'm also on Mastodon now. I'm on Mastodon also as Code Brood. Awesome.
It was nice meeting you. That was so cool.
It was great to meet you as well.
Okay, friends, here are the top 10 launches from Supabase's launch week number 12. Read all the details about this launch at supabase.com slash launch week. Okay, here we go. Number 10, Snaplet is now open source. The company Snaplet is shutting down, but their source code is open.
They're releasing three tools under the MIT license for copying data, seeding databases, and taking database snapshots. Number nine, you can use PG Replicate to copy data, full table copies, and CDC from Postgres to any other data system. Today it supports BigQuery, DuckDB, and MotherDuck with more syncs to be added in the future.
Number eight, Vect2PG, a new CLI utility for migrating data for vector databases to SuperBase or any Postgres instance with PG Vector. You can use it today with Pinecone and QDrant. More will be added in the future. Number seven, the official Supabase extension for VS Code and GitHub Copilot is here. And it's here to make your development with Supabase and VS Code even more delightful.
Number six, official Python support is here. As Supabase has grown, the AI and ML community have just blown up Supabase. And many of these folks are Pythonistas. So Python support expands. Number five, they released log drains so you can export logs generated by your super-based products to external destinations like Datadog or custom endpoints.
Number four, authorization for real-time broadcast and presence is now public beta. You can now convert a real-time channel into an authorized channel using RLS policies in two steps. Number three, bring your own Auth0, Cognito, or Firebase. This is actually a few different announcements.
Support for third-party auth providers, phone-based multi-factor authentication, that's SMS and WhatsApp, and new auth hooks for SMS and email. Number two, build Postgres wrappers with Wasm. They released support for Wasm, WebAssembly, Foreign Data Wrapper. With this feature, anyone can create an FDW and share it with the Superbase community.
You can build Postgres interfaces to anything on the internet. And number one, Postgres.new. Yes, Postgres.new is an in-browser Postgres with an AI interface. With Postgres.new, you can instantly spin up an unlimited number of Postgres databases that run directly in your browser and soon deploy them to S3. Okay, one more thing. There is now an entire book written about Supabase.
David Lorenz spent a year working on this book, and it's awesome. Level up your Supabase skills and support David and purchase the book. Links are in the show notes. That's it. Super Bass launch week number 12 was massive. So much to cover. I hope you enjoyed it. Go to superbass.com slash launch week. That's S-U-P-B-A-S-E dot com slash launch week.
So today on the show, we have Anita Zhang from Meta. And Anita, you are a engineer D, manager D is your title. Is that correct? Yep. I think that's fabulous as a Linux user and a long time restarter of services. Tell us about what you're responsible for at Meta.
Well, I support a team that basically, well, my manager calls it supports the Meta's Linux distribution team. I like to call it operating systems. Sounds better, but we primarily contribute to system D, to BPF related projects, building out some of the common components at the OS layer that other infrastructure services build on top of.
So you're the kernel of Meta's infrastructure?
We have like an actual kernel team to do the kernel, but one layer up, I guess.
One layer above that. So describe the infrastructure, describe the Sforces. I've been following what Facebook and Meta have been doing for a long time as a Red Hat user at other places and seeing the upstream contributions. But I know many people to this podcast may not know what that infrastructure looks like and what you actually do.
Yeah. I mean, we've been around a while. We have We personally, the company owns millions of hosts at this point, a mix of like compute, storage, and now the AI fleet. Teams primarily work out of a shared pool. So we have a pool of machines called TW Shared where all of the container jobs run. There are a few services that run in like their own set of host prefixes.
But for the most part, the largest pool is TW Shared. A lot of our infrastructure to support this scale is homegrown.
I don't know anything off the shelf that's going to do a million hosts.
Yeah, me neither. That's amazing. So Meta has their own flavor of Linux, I guess?
No, we actually use CentOS for production, all of our production hosts, and even inside the containers we're using CentOS. Desktops are primarily some flavor of Fedora, Windows, or macOS.
And what does that look like for what you're doing on the fleet level? You're provisioning the OS or have some tooling to provision the OS. And from talks that you've given that I've watched, you had a great talk at scale, by the way. If anyone wants to see that talk, it's on the scale websites. But you're doing upgrades.
If I want to upgrade a million hosts, I was like, hey, I need to roll out a new version of the operating system. that's going to take a little while. There's a lot of process and there's a lot of risk there, right? Because like you could be causing other things to fail. So how do you do that in a safe way and at that size?
You know, we've gotten a lot better at it over the years. When I started, we were doing like CentOS six to seven. And I think that probably took like a year or two to actually reach over like 99% of the fleet. And there's always that trailing 1% that For some reason, they can't shut down their services or they don't want to drain or lose traffic or things like that.
But now we're able to complete, I'd say like 99% of the fleet in a year or less. We started doing a lot of validation sooner. So now we actually hook in Fedora ELN into our testing pipeline and we start deploying parts of Fedora ELN and running like our internal container tests against them. And so that has caught a few like system wide distribution changes that we'll be ready for.
Like once sent to us, I guess now sent to us stream 10 is going to be released later this year.
Describe Fedora ELN. Like why is that different than what you're running?
So Fedora ELN is, man, I don't know what exactly it stands for. It's Fedora something next. So it's going to be like the next release of Fedora that will eventually feed into things like CentOS Stream.
Basically like the Rawhide equivalent of like, hey, this is a rolling kind of new thing. Yeah. But eventually that gets cut down. How does that relate? Or I'm actually really curious, like CentOS Stream, right? When they moved to this rolling release style of distribution, how did that affect how you're doing those releases and doing upgrades for those hosts?
Because you have to at some point say like, this is the thing we're rolling out, but the OS keeps going.
Yeah, I'd say the change to stream didn't really affect us much because we were already kind of doing rolling OS updates inside the fleet. So when new point releases get released, we have a system that syncs it to our internal repos and then updates the repositories.
And then we have Chef running to actually pick up the new packages and things and just updates depending on what's in those repositories. So the change to Stream didn't really change that model at all. We're still doing that, picking up new packages on like a two-week cadence.
Do you guys use a lot of automation that you build in-house?
Yeah, we kind of have to.
The repo syncing, I had a project at Animation where we had RHEL that we would sync all the repos internally. It all sits on NFS. And then we mount everything to NFS to pull in repos. And I forget, it was like a Jenkins tree of syncing jobs that would all run to register a system and pull down. It was like 300 or something repos that we would sync every night.
And I'm like, OK, let's fetch all the files now. Oh, yeah. And then squirrel those away somewhere on a drive and then host them so that everyone else can can sync to it and then have it roll out to the testing fleet.
It's a lot of data and it's a lot of stuff that just have to, as packages get removed from upstream and you're using them in places, I'm assuming you have some isolation there because as far as I know, most of your workloads are containerized on the Twine, on TW shared as the base infrastructure, right?
Yep. So containers, they don't get the live updates that the bare metal hosts get. So users can just find their jobs in a spec. And for the lifetime of the job, the packages and things that go into it don't change. I mean, there are certificates that also are used to identify the job. Those get renewed. But we have a big push to get every job updated at least every 90 days.
Most jobs update more frequently than that.
Is that an update for the base container layer or whatever they're building on top of?
Yeah, they'll actually have to shut down their job and restart it on a fresh container and they'll pick up any new changes to the images or any changes to the packages that have happened in that time.
Can you describe TW Shared for the audience as well? Because that's one of the things that I think is really fascinating that you have your own container scheduler. And as far as I know, all those containers are running directly with system D, right? Like you're not having like a shim of like an agent. I mean, you have agents, but go ahead and describe it.
So I used to work on the containers team, the part that's actually on the host. The whole like Twine team consists of like the scheduler and they're like resource allocation teams. to figure out which hosts we can actually use, how to allocate them between the teams that need them.
And then on the actual container side, we have something called the agent that actually talks directly to the scheduler and translate the user specification into the actual code that needs to get run on the host. And that agent sets up a bunch of namespaces and starts systemd and basically just gets the job started.
And that's systemd inside the container?
Yeah. So the bulk of the work that is done in the agent, at least for the systemd setup, is it translates the spec into systemd units that get run in the container. So if there are jobs, if there are commands that need to run before the main job, those get translated to different units. And then the main job is in its own unit as well.
And then there's a bunch of different configuration to make sure the kill behavior for the container is the way we expect and things like that. There is a sidecar for the logs specifically. So logs are pretty important, as you'd imagine, to users being able to debug their jobs. There is a separate service that runs alongside the container to actually make sure that no logs get lost.
And so those logs get preserved in the host somewhere.
Twine sounds really cool, too. I was reading the white paper about that yesterday.
How does that work with the sidecar? I would assume, I've never really actually done this side, like systemd inside the container running on systemd. So if I log into a host, not the container, I see just services all the way down. They just look like standard systemd units. They're just isolated from each other. Is that right?
Yeah. So the container job, it will be one systemd unit, and you'll see a bunch of processes in it. And you'll also see a couple of agents that we run, but mostly just the usual systemd PID1 inside the container and like their own instance of JournalD, LoginD and all that stuff.
And that was the question I actually had. It was like, I assumed that JournalD would handle the unit logging, but you say there's a sidecar that I'm assuming is like getting that logs out to JournalD on the host or at least some way so that you don't lose those logs inside the container.
Yeah.
That's cool. At that point, it's just native system D, really. We're just using every feature of system D to isolate and run those jobs. And then you have an overarching scheduler, resource allocator, all that stuff.
Yeah, pretty much.
One of the things that I found super interesting in the white paper was host profiles, where different workloads, you basically virtually allocate clusters, I guess, for lack of better, entitlements is what you call them, for like, hey, this job gets this set of hosts, and then you can dynamically switch those hosts to needing different kernel parameters, file systems, huge pages.
And you have a resource allocator that does that, as far as I understood. How does that affect what you're doing? You have a set of host profiles. You say, hey, you can pick from a menu. And then we know how to switch between them. How does that typically work?
So that part's a little newer than from the time I was in containers. But so you create a host profile, you work with like the host management team to do that. And then you can, I believe, specify it in your job spec. And then when you need to either restart your job or move the job around, they actually have to drain the host.
Most host profiles require a host restart, because things like huge pages, you need to restart the host to apply. And then the jobs gets started back up on the host with the host profile you're asking for.
How does that affect you as the OS team? Like, is there anything that you're doing specifically for that?
Not specifically, but they do. So the host agent actually builds a lot of their components on top of system D as well. So they've been doing things like moving more configuration out of Chef into host agent where it's more predictable. So things like systemd networkd configs or the syscuddle configs that also go through systemd as well.
Is that a Linux penguin on your sweatshirt? Because that's the coolest sweatshirt I've ever seen.
Oh, yeah. The tux hoodies. This is the one that Justin was talking about. That is so cool.
Yeah, they had them at scale and I was very jealous because they're cool. And this is an audio podcast, so no one knows what we're talking about. But basically, it's a bunch of little small tuxes inside the hood of the hoodie.
That's so cool. If anyone from scale is listening, they probably have a hoodie.
I'm sad that I missed your talk at scale. It was on my schedule. And then I think I forget what we were doing, but somehow ended up somewhere else. And I was super sad to miss your talk. Do you get to contribute a lot to open source? Because meta seems really big on contributing or letting like releasing things for free, I guess.
Yeah, I'd say at least the way the kernel team and our team operates is that we're mostly upstream first. Um, so everything that we write, we write it with like the idea that we're going to be upstreaming it. And that's how we managed to keep our team size small so that we don't have to maintain like a bunch of backports, things like that.
That's something you have to wait for it though, right? Like you're like, we're going to write this internally. We're going to hope this gets upstreamed. And then we have to either wait for the release to consume it, or we're just going to keep running it. But then if upstream needs changes, you have to kind of like merge back to it.
Yeah. So the kernel, we actually like, build and maintain internally. So we can kind of pull from the release whenever you want. And we can kind of do the same thing with CentOS too, because we all contribute to the CentOS hyperscale SIG. That's where any bleeding edge packages that we want to release immediately goes into the hyperscale SIG.
It's really cool that you guys contribute to upstream first and then, but also kind of maintain your own stuff. So that way you can kind of pick and choose if you want to put something like, you know, if it's like a bug fix that you need earlier, you can already apply that.
I mean, I'd say meta is like super into, you know, release frequently. And so if we always stick to like upstream, then we'll always get like the newest stuff and we're less likely to run into some obscure bug from like two years ago that is really hard to debug.
How does release frequently and a million hosts go together? Because you mentioned that it takes about a year to basically roll out an update to every host. But if you're pushing out updates to the OS every month, then you have 12 different stages of things that are going through release. And that makes it really hard to debug and predict, oh, what version are you on?
Did we fix that bug somewhere else? How do you manage that?
Yeah, so it's mainly the major upgrades that take up to a year. So, you know, when we're about to go from 9, CentOS stream 9 to 10, that will probably take a long time than if we were just doing like our rolling OS upgrades. So the thing about CentOS is that we do maintain kind of like ABI boundaries. So we expect that the changes that, you know, Red Hat and CentOS are making.
Two packages are mostly like bug fixes that won't break compatibility in the program. And that's remained true. We haven't run into a lot of major issues with rolling OS upgrades. Most issues come from like when we personally are trying to pull in like the latest version of system D or something and we're rolling that out. Those we have to do with more intention.
You mentioned an AI fleet. From what I've heard, Zuckerberg talk about is like, Meta has more GPUs than anyone else in the world, basically. How do you manage that? Not only are how the drivers installed, because Linux and NVIDIA aren't always known to be the best friends, but then how do you isolate those things and roll out those changes?
Yeah, I'm probably not the best person to ask about it, but we do have a pretty sizable team now of production engineers dedicated to supporting the AI fleet and making sure that it's stable and that our train jobs don't crash and things like that.
Under TW Shared, do they just show up as a host profile? Or is that like, do I get an entitlement that says I need GPUs for this type of workload?
It's more like the latter. So even though everything's in TW Shared, we know what kind of machine type they are. So you can specify what purpose you're using the machine for and things like that.
What's the difference between a production engineer and a system engineer?
Well, I'm a software engineer technically, I guess.
The title.
So a software engineer, then there's a production engineer, a system engineer. I guess then what's like... There are a lot of titles. I know.
I'd say production engineer and software engineer are the most similar, especially in infrastructure. When I was in the containers team, the production engineers and software engineers pretty much all just did the same stuff. Like we were all just focused on scaling and making the system more reliable.
I'd say in like a product team, production engineers focus more on operationalizing and making the service production ready while the software engineer is kind of like creating new features and things like that.
Okay, that's interesting. One thing I found fascinating about some of the talks you've given and information is the fact that Meta is still notably an on-prem company. You have your own data centers, you have your own regions, you have machines, and it doesn't seem like you try to hide that from people. You don't try to abstract it away.
At least I haven't ever seen a reference to like, it's our internal cloud. No, it's like it's a pool of machines and people run stuff on the machines. And the...
software and the applications running on top of it are very much like a this is this is just like system b unit you're just running it containerized what other types of services do you have internally that people need i mean i saw references to things like sharding for like hey we need just fast disk places and we need some storage and databases externally but like what are the pieces that you find that are like common infrastructure for people to use
Yeah, I mean, I probably dispute the fact that people have to understand kind of like the internals of how the hosts and things are laid out. So the majority of services, we're talking like millions of hosts and TW shared, they are running containers.
And I'd say a lot of their knowledge about the infrastructure probably stops at when they write the job spec and to the point where they go into the UI and look at the logs. So if you're just writing like a service, a lot of that's abstracted away from you. You don't even have to handle like load balancing and stuff. There's like a whole separate team that deals with that as well.
That's awesome.
Yeah, but if you're on the infrastructure side, sometimes you need to maintain those widely distributed binaries on the bare metal hosts. So like us running SystemD or the team at Siamat that does the load balancing, they also run a widely distributed binary across the fleet on bare metal.
There's also another service that does specifically fetching packages or shipping out configuration files and things like that. But yeah, most of the services people write, they're running in containers. Databases, they have kind of their own separate thing going on as well.
Most of them are moving more into TWShared as well, but they have more specific requirements related to draining the host and making sure there's no data loss.
Right, how those shards, making sure enough of the data replicas are available.
Yeah, but they're one of those teams that, They just want their own set of bare metal hosts as well to do their own thing with. They don't care about running things in a container if they don't have to.
Typical DBAs. What would you say are some of the challenges you're facing right now on the OS team or just in general in the infrastructure?
The AI fleet's always a challenge, I guess, making sure jobs stay running for that long. I think we're... Every site event is like kind of an opportunity to see where we can make our infrastructure more stable, adding more validation in places and things like that. Just removing some of the clowniness that people who have been here a long time have kind of gotten used to.
And you mentioned that as far as moving more things out of something, traditional configuration management like Chef and moving it into more of a host native binary that can manage things, I don't want to say more flexibly, but I guess more predictively. I think you mentioned that where it's just like, yeah.
Yeah, making things more deterministic, removing cases where teams that don't need to have their own host, shifting them into TW shared so that they're on more common infrastructure, adding more safeguards in place so that we can't roll things out live and stuff like that.
You also mentioned in the, again, referencing the paper, because I just recently read it. All of your hosts are the same size, right? It's all one CPU socket. And I think it was like 64 gigs of RAM or something like that.
Yeah, that's probably not true anymore. But yeah, the majority of our compute fleet looks like that. Yeah.
Okay. So the majority of TW shared is like, we have one size and you're just like, everyone fit into this one size and we will see how we can make that work. Right. Cause you, you can control the workloads or at least help them optimize in certain ways to say, cause like not all AI jobs or big data jobs are going to fit inside of that envelope.
Yeah.
Especially with databases and AI.
Yeah. And we're trying to shift to a model now where, you know, we have bigger compute hosts so that we can run more jobs side by side. Stacking because realistically, you know, one service isn't going to be able to scale to like all the resources on the host, uh, forever. So yeah, we're, we're getting into stacking now.
So yeah, it's more of like a, a bin packing approach and saying like, Hey, maybe we do have some large hosts for, especially for, I'm assuming for the jobs that do need like, Hey, I don't fit in 64 gigs of Ram and I don't, you know, local NVMe isn't fast enough for whatever reason, or it's going to cause the job to run longer.
Do you think AI is going to change the way that Meta does infrastructure because you're adapting to the change in how much bigger the hosts you need and how much more GPUs and all that kind of stuff?
Oh, I mean, even in like the past year, we've made a few notable infrastructure shifts to support the AI fleet. Yeah, it's not even just like the different like resources on the host, but like all of the different components. A lot of them have like additional network cards. managing how the accelerators work and how to make sure they're healthy and things like that.
Yeah, I suppose once you have any sort of specialized compute or interface, whether that's network, some fabric adapters, you always have snowflakes in some way. It was like, hey, this is different than the general compute stuff.
Oh, yeah, for sure.
How has that affected your global optimization around things? So I know, again, the paper was old now. It's like 2020, I think, is when it was published, which is probably looking at 2019, 2018 data. But in general, it was something like 18% overall total cost optimization because of moving to single size hosts, because you're just like, hey, our power draw was less overall globally.
And something like, I think it was the web tier was like 11%. I should have had it up in front of me. 11% more performance by switching to host profiles and allowing them to customize the host. Have you had things like that over the past four years with these either optimizations in specialized computes that have allowed you to even gain more global optimization?
Because a million hosts, like a 10% gain in efficiency or lower power requirements is huge. That's like megawatts of savings.
You know, we are also working on our own ASICs to do like inference and trading. That's probably the place where we're going to see like, not just like the monetary gains from developing in-house, but also, you know, on the power and resource side as well.
That's fascinating.
That's starting to come out this year in production.
Have you been enabling that through like FPGAs that you allow people to program inside the fleet? Or how does that like, how do you come out of like, hey, we have an ASIC now and it does some specialized computing task for us?
Yeah, that's a better question for the silicon team. I only see the part where, you know, we actually get the completed chip, but I'm sure they're doing their development on FPGAs.
And at some point they have like, here's a chip, go install it for us. And you need, here's a driver for it, right? Like they need to give that to you as a host team.
Oh, yeah. We have a team that is actually, I work pretty closely with that writes FPGAs. We shifted to like a user space driver. It just uses VFIO over the kernel. I think the chip is just, the accelerator is just over PCIe.
Meta sounds awesome. It sounds like you get to actually really dive deep on what you're learning and like you're part of infrastructure or development because it seems like you have teams for everything.
Yeah, I'd say you can really go as deep as you want to here.
Yeah, I really want to see an org chart now. I was like, there's so many of these teams that just keep popping up of like, oh yeah, no, we have a team that does that.
I know. I'm like, that's cool that it almost gives you enough abstraction that you can really focus on your specialty because you get to really be deep in that area because you're not having to worry about all the extra components, I guess.
Yeah. I mean, that's my favorite part. I mean, some people are just really into like developing C++ or the language. But then I'm on the infrastructure side. I just really like working directly with hosts.
And you've been there for a little while now, right?
Almost eight and a half years at this point.
I feel like people go to Muddah and stay there forever because you probably get to get really good at whatever you're doing. Plus, I feel like it would be cool to talk to those other teams because when you have questions, they must be really good Like if they're so specialized in that area, then they must know so much about that when you go to like collaborate with other teams.
Yeah, it's super nice just feel to ping like anybody over work chat, like literally anyone. Just if you have a question, everyone's super nice about helping you out as long as you're nice too.
What'd you do before Meta or is this like, like have you worked at Meta your whole career?
Yeah, I started here out of graduation. I did one internship. Uh, before I started here full time.
What are you looking forward to working on in the next year? Are there big projects or big initiatives that you would like to tackle or even things in the open source or like things that you want to give back and make sure other people know about?
I mean, I'm always interested in doing more stuff with system D. I think there's still a bunch of components internally, um, that could be utilizing system D in more ways, you know, making sure that we're all in the common base. That's kind of the main, like, general goal that I'm always going to be focused on, I guess.
There are also some bigger, I mean, the Journal D, I've been trying to get us to replace our syslog completely. and move entirely to systemd-journald. That's an ongoing effort.
That was one of my best claims of fame at Disney+, was I disabled rsyslog. I was like, no, we don't have it. It was just journald. I was like, we're just doing journald now. And it saved us so much just like I-O throughput on the disks and everything. And there was a lot of problems with it, too. Maybe we weren't ready to do that. But I was like, no, we can't ship Disney+, until rsyslog's off.
Yeah, I want to be there.
It was great. It was a great feeling one day where I'm like, I don't need this anymore. I don't need our syslog.
I mean, moving completed to systemd networkd was pretty cool. But I mean, now that that's done, I can just like be happy with it. There are probably some more stuff we're going to be doing with like systemd umd, the out of memory killer. I think we're about ready to get Senpai upstreamed into systemd. Senpai is like a memory auto resizer that we wrote.
And I don't think that that's been open sourced in any way. I mean, we have like an internal plugin to do that with the old like FBMD. I think it's time to get that into system VMD as well.
Is that for resizing the like container, like the C group and saying like you, how much memory they have available or is that something different?
It's a way to kind of poke a process and like make sure that they're only using the amount of memory that they actually need. Cause a lot of, you know, services and things will allocate more memory than they need.
Interesting. It's a little like, get back in line. You don't get that memory.
A little bit.
Have you been doing anything with immutable file systems or read-only, or AB switching hosts? Fedora has Silverblue. I use a distro called Bluefin, which is built on top of that, which does AB switching for upgrades to do reboots every time. It sounds like you're doing rolling updates, so you would still be writing packages to disk instead of flipping between partitions.
I mean, we're trying to shift to more of an immutable model. Internally, we have something called MetalOS. And right now, we're rolling out a variation of MetalOS called Maclassica. The goal is kind of an immutable file system, but it's making strides to get there. We still have to rely on Chef to do a lot of configuration, but a lot of it has shifted to a more static configuration.
that is more deterministic and gets updated at a cadence where we can more clearly see what the changes are.
And I was asking that because leading into you saying you want more systemd stuff, and I'm curious if you're trying to use things like systemd system extensions or sysx or whatever it's called that are like layering different things on top of systemd, which is typically for an immutable file system, but still allow changes to happen.
Yeah, I haven't looked too deeply into what that team's been up to. But I do know that they did make use of some of the bleeding edge systemd features to build these images and things like that. We're not using systemd sysx just yet. I mean, I wouldn't count it out.
Yeah. It's one of those things that looks really interesting, especially if you try to move more into immutable file system layers. Like, hey, I still need to configure this. And how do I do that in a composable, immutable way?
Mm-hmm.
Well, Anita, this has been great. I'm just nerding out because I'm trying to learn all of the things that I've done in the past and still doing in the future.
I think it's great that Meta is not only doing this at just a core level of just like, hey, we just have SystemD and things running in that, but also giving back upstream with the SystemD builds and all the stuff that you've been publishing, not only white papers, which Autumn and I were reading, and talks, but also just the open source work. So I think that's fascinating.
because that's a whole other topic.
Oh, yeah. You have to come back. I think Meta gets a really bad rap for a lot of things, but I don't think you guys get enough credit for the amount of open source you guys do and the white papers. I mean, the white papers you guys have written on databases and the database contributions alone is amazing. And there's been so many things given away for free so people can gain knowledge.
I don't think Meta gets enough credit for that.
I mean, I think from the engineering standpoint, we just kind of get the warm fuzzies when people actually use and like the stuff we write.
That's like the best part of being an engineer.
I find it fascinating because Meta is one of the few places that doesn't sell the things that they talk deeply technically about, where it's like a lot of Amazon and Google and Microsoft are like, hey, we built this amazing thing. Now go buy it from us. And at Meta, it's like, no, we're solving our own problem and we're just giving it back to you.
That's what I'm saying. I think that people talk about what meta does wrong, but rarely do people talk about the fact that they'll be like, hey, I just figured this really cool way to do this at a crazy scale. And here it is that you can read about it and learn about it for free. And I'm like, that's awesome.
So I think I've learned a lot from like the different database papers and like different white papers that you guys have released. And just, it's crazy that you guys released an entire AI model like for free. Like it's insane.
Yeah. Yeah. I've been running Llama. I haven't done Llama 3 yet though, but it's on my list of things to play with. Awesome.
I feel like white papers are like a great way to learn and really get like in depth for something. So you can go and like do that project or try something out because you get to see like why that solution was made for that problem, you know, and kind of like figure out like where, how they use the projects that you guys release. So I think it's cool the way you do that.
Oh yeah. I really appreciate the academic side of things.
Anita, thank you so much. And we'll reach out, I'm sure, in the future with more things, maybe in the future, talk about eBPFs and ASICs and more work that you're doing on the OS layer, because that's just a fun thing and seeing how it grows.
All right. Looking forward to it. Thank you.
Have a great day.
Hey friends, I'm here with Todd Kaufman, CEO of Test Double. You may know Test Double from friend of the show, Justin Searles. So Todd, on the homepage for Test Double, you say, great software is made by great teams.
We build both. That's a bold statement. Yes, we often are brought in to help clients by adding capacity to their teams or maybe solving a technical problem that they didn't have the experience to solve. But we feel like we want to set up our clients for future success and the computers just do what we tell them. So, well, at least for now.
We try to work with our client teams to make sure that they're in a great state, that they have clarity and expectations, healthy development practices, lean processes that allow them to really deliver value into production really quickly. So we started a lot of our engagements by just adding capacity or technical know-how.
We end a lot of our engagements by really setting up client teams for success.
Yeah, I like that. So when you say to someone, you should hire Test Double for this reason, what is that promise?
I'll throw out a couple of different promises. I would say, one, we will leave your team in a better state than we found them. And that may be improving the code base. It may be improving some of the test suite. More often than not, it's sharing our experience and our perspectives with your team members so that they're accelerating along their own kind of career growth path.
Maybe they're learning new tech by virtue of working with us. Maybe they are figuring out ways to build software with a higher level of quality or scale, or maybe they're even focusing on the more human side of the equation and figuring out how to better communicate with coworkers or stakeholders or whomever. So that's guarantee number one.
The other one I would say is that we're going to deliver without being a weight on your organization. So by that, I mean we're able to come in really quickly, acclimate. Learn your systems, learn your processes, learn the right people and deliver features within our first days there. So our challenge to our team is to always be shipping a pull request in the first week of work.
So we acclimate very quickly and we're very driven to get things done. That means we don't require a lot of supervision or management overhead or technical support. the way some companies envision working with a consulting firm. So we really challenge ourselves and guarantee to our clients that we're going to be very easy to work with.
Very cool, Todd. I love it. So listeners, this is why Edward Kim, co-founder and head of technology at Gusto says, quote, give test double your hardest problems to solve, end quote. Find out more about Test Double's software investment problem solvers at testdouble.com. That's testdouble.com, T-E-S-T-D-O-U-B-L-E.com. And I'm also here with Dennis Pilarinos, founder and CEO of Unblocked.
Check him out at getunblocked.com. It's for all the hows, whys, and WTFs. Unblocked helps developers to find the answers they need to get their jobs done. So Dennis, you know we speak to developers. Who is Unblocked best for? Who needs to use it?
I think if you are a team that works with a lot of coworkers, if you have like 40, 50, 60, 100, 200, 500 coworkers, engineers, and you're working on a code base that's old and large, I think Unblocked is going to be a tool that you're going to love.
Typically, the way that works is you can try it with one of your side projects, but the best outcomes are when you get comfortable with the security requirements that we have. You connect your source code, you connect a form of documentation, be that Slack or Notion or Confluence. And when you get those two systems together, it will blow your mind.
Actually, every single person that I've seen on board with the product does the same thing. They always ask a question that they're an expert in. They want to get a sense for how good is this thing? So I'm going to ask a question that I know the answer to. And people are generally blown away by the caliber of the response.
And that starts to build a relationship of trust where they're like, no, this thing actually can give me the answer that I'm looking for. And instead of interrupting a coworker or spending 30 minutes in a meeting, I can just ask a question, get the response in a few seconds and reclaim that time.
Okay, the next step to get unblocked for you and your team is to go to getunblocked.com. Yourself, your team can now find the answer they need to get their jobs done and not have to bother anyone else on the team, take a meeting, or waste any time whatsoever. Again, getunblocked.com. That's G-E-T-U-N-B-L-O-C-K-E-D.com. And get unblocked.
Thank you so much, Gina Hoiska, for joining us on the show today. And can you tell us about yourself and how you got started with creating Octoprint?
Yeah, so you already said my name, but I'm also known as Fusel around the world, especially around the net. So if anyone has come across that name, then yeah, that's me. Hi.
And yeah, well, Octoprint, that happened basically when I got myself a 3D printer back in late 2012 and found myself in a position that it was sitting here next to me in my home office, producing noise, producing fumes and annoying the hell out of me because I just wanted to not sit next to it while it was doing stuff, but it took hours to finish whatever it was doing.
And so I figured there must be some way to just put it in another room, but still monitor it from afar. through Wi-Fi and such. And I figured there's probably something out there that does this. It turns out, nope, there wasn't something like this. And I happened to be a software engineer. So that became a bit of my vacation project over Christmas, pretty much.
And I threw it on GitHub after that in January and thought I was done. Back then, it was just a really, really basic thing.
monitoring temperature, already having this feedback loop where you also had some webcam implementation and all of that to be able to see what your 3D printer was doing while it was running through your jobs and some basic file management and such, but definitely way smaller project than it is now over 10 years later.
I threw it on GitHub and within a week or so, the emails started coming in and the feature requests started coming in. And then it took over my life. And now I've been doing it full time for almost 10 years and crowdfunded for, wait, we do have 2024 now. So that must be eight years, I think. Yeah. Eight years full time crowdfunded work.
That's awesome.
An open source project.
That's one of those success stories of open source and crowdfunding, right? Because that's not a common thing for it's like, oh, one person started a project and now you can actually make your living off of this hobby or originally hobby sort of thing. And that's really awesome just to hear that it's the community around it has come together to be able to support such a cool project.
2012.
What printer was even available 2012? That's like the CNC cupcake machine.
In my case, it was an Ultimaker. That was like, yeah, a big wooden box printer. No heat to bed.
Yeah, no one even knew what to do there.
Very slow and very weird. And the filament was still thicker. It printed with the 3 millimeter stuff, which actually was 2.85 millimeters, but still almost twice the diameter of what we use these days mostly, so 1.751. It's like melting crayons. Yeah. It was weird when I got my first roll of filament of 1.75 millimeter filament in my hands.
It felt so weird and not good and like it would break just by looking at it and such because I was just used to all of this 2.85. And then I think last year or so, I threw out all of the old 285 that I still had and looked at it and it looked so heavy and strong and what I was able to print with that. No way. So, yeah, things really changed.
So in 10 years of Octoprint, how many printers do you support? It seems like it grows every time I check it out.
Yeah, so the thing is that most printers out there actually run on open source firmware and have more or less agreed on a communication protocol. I say more or less because a lot of the printer vendors actually adjust the firmware often without really knowing what they are doing with the result that they break the firmware in the process and then things get really tricky for the users.
Because then usually they do not know how to fix it. And yeah, in the end, that is always when I'm very happy that I also built a plugin system into OctoPrint because that allows to work around these things so that people can just, if they have a printer like that and also happen to know how to code or can find someone who can see the issue and work around it.
Or maybe if it's a large enough community, then maybe I can also do that. It's just a little plug-in that pretty much translates from the broken firmware into something that is more standard conform. And that way, yeah, pretty much everything that is sold out there is supported by OctoPrint. But these days it gets a bit more tricky because...
A whole bunch of printers are now currently coming out that have a full-blown host system. So OctoPrint is a so-called print host. And a lot of printers now come with something similar, fully blown on board. So they only now have a Wi-Fi interface. They often have an integrated full graphical display and such. And it is really tricky now to access these systems.
and use them with something that the vendor did not plan on, which is a bit sad.
That's how my son's printer... Well, he has a toy box, so it's meant for little kids to use with their iPads. So in a way, it kind of monitors, but it kind of makes it limited what you can do with it because it comes with its own software and everything.
Yeah, I switched off from an Ender Pro to a Bamboo, and the Bamboo is pretty much self-contained. Closed source. Yeah, closed source. And it was like... I had such a hard time because I had so many printers in the past that I always wanted them to be open source and I wanted them to work certain ways. And I always spent more time fiddling with them than using them and printing.
And so I saw recommendations for the bamboo and I'm like, I'm going to try it. I'm going to go with this one. I know it's closed source. They have a whole ecosystem of stuff. And I think the problem is going to be when, when things break and I can't fix a problem or I can't troubleshoot and find, you know, Find a community around like, hey, how does this work?
It's all just going to be like, oh, well, here's a janky fix we have that shows you how to do something.
There's good news for you, though. Someone wrote a plugin that allows bamboo printers to work with OctoPrint.
Really?
Oh, that's awesome. I really want a Bamboo, so that's why I'm just like... I'm not sure if it works with all of the models and such, but it's the plugin developer, basically, on Octoprint. He's the one with the many plugins.
I keep watching everyone's videos on Twitter and TikTok, and I want a Bamboo so bad, but I'm like... I don't want to get locked into the software.
I'm not touching that with a 10-foot pole. I saw one in person with a buddy. And mechanically, I was very, very impressed. But then also this news hit recently. Oh, not recently. That's almost been a year now or so, I think.
where they had this funny security issue where some printers suddenly fetched the wrong stuff from the cloud and started printing in the middle of the night for models from strangers.
And that is just something... I did not hear about that.
Yeah. And stuff like this happens, then this is a big, big no for me. And also the part with all of what 3D printing is these days, what 3D printing has come to over the last 10 years, that was... done on the shoulder of open source.
And now all of these companies, it's not just Bamboo, it's a bunch of others as well, are just rolling in and trying to lock everything down and trying to lock everything in and creating their own little gardens. And it's just not the way that I want to see all of this happening. I'm a bit afraid that we will lose all of the open access that we have now if stuff continues like that.
I think open source as a whole, like databases, everything has gotten really weird with where do we go from here with like having companies in open source.
License changes.
Yeah, it's been very interesting.
Now back to Octoprint for a bit. I saw you had a release last week. What does that release process look like? Because you have this huge system that supports all of these printers and you have these plugins and all of these features. How do you actually go about releasing and testing that to say like this is a new release of OctoPrint?
So it should be obvious that it's pretty much impossible to test every possible printer, firmware, plug-in, operation system, starting state of software situations. So what I do before I actually roll out a full release is there goes a long, long phase of release candidates. And Octoprint has a release branch system built in. So if you feel fine with testing stuff that is not...
necessarily fully stable yet then you can just switch over to another release branch and then you will get release candidates whenever I push those out and they actually get the same procedure that I do for every single release and I will go quickly over that later as well but
The idea behind that is that if I have something like 1,000, 2,000 people out there testing a release candidate and putting it through several years of print duration over the course of the release candidate phase, then I can be pretty sure that a lot of these combinations that I would never be able to test has been tested. And
Yeah, it usually takes something like three to four release candidates until no more bugs come in. And at that point, then I declare this stable. And of course, after I've pushed out a stable release, so the current stable version is 1.10. But we are now already at 1.10.1. So there are bug fix releases that I also push out.
Those do not go through a full release candidate phase again, but they only get bug fixes and maybe small minor improvements of existing functionality. They do not get new features. They do not get... big changes. They obviously also get security fixes, stuff like that. But I try to really limit what goes in there.
And if it feels too risky, then it goes into the next stable release that will actually get the full release candidate phase again. And what I do for every single release is... So OctoPrint can basically run anywhere where you can run Python. But most people run it on a Raspberry Pi. So that is also what I concentrate on for testing.
And there is this dedicated image that someone else is maintaining, Guy Sheffer, for OctoPrint, which is called OctoPi. And a lot of people confuse the image with the software and the software with the image, which also causes a lot of complications in support. But
anyhow uh so octopi is the most common environment that octoprint will be installed on out there so what i have here is i built myself a little test rig that has three raspberry pi threes which is the current basic option that i suggest so get a three of because that basically is the best thing that you can get the lowest supported version and uh if if
If you want something with more power, then of course you can get something else. But the 3 is like the base version that I look at. So I have three Raspberry Pi 3s there. And all of these have a little card adapter in there that can be switched through USB either to act as a mask storage device through a host. on the one end, or as an SD card on the other hand.
So that is slotted into the SD card slot of each of the Raspberry Pis, and all of these then go into a USB hub to a fourth Raspberry Pi, a Raspberry Pi 4, actually, which I call the Flash Host. And that thing also has control over the little powered USB hub through which I power the three Raspberry Pis. And now I can individually power them on and off.
And I can also individually unmount and mount their SD cards and flash them without having to physically release the SD card and push it into a flashing stick and then flash. That is what I did until 2020. And it was driving me nuts because...
Well, that's what I've been doing. No, this sounds fascinating. I didn't even know you could have an SD card on one end. It's connected to the USB on the other side, and you can switch it back and forth.
One of these things costs me $100, but they exist. Hey, sometimes that $100 is worth it. Yeah, it saves how much time? Yeah, I mean, I have three.
That was really worth the money that I spent on that because what I do on every release is basically I flash a whole bunch of starting versions on the Raspberry Pis, like Octopi version X with Octoprint version Y. And then I look if I can upgrade to the release to be from that version through all of the regular update mechanism.
And for that, of course, I need not only flash the SD card, but also provision it with the Wi-Fi credentials and then SSH into that thing and do all of that. And all of this is automated now, thanks to this little test rig that I built. So I just tell it, flash device A to this version of Octopi, make sure Octoprint is at that version and also switch it to this release branch.
And then please also fire up the browser when it's done with that. And so before every release, I have this huge checklist in my tooling and go through all of that. And of course, the usual stuff like create new tags, create a change log, make sure the translation is up to date. The German one, this is the only one that I maintain.
Everything else needs to be supplied by people who actually speak the language fluently that they are targeting. also add supporter names and all of that. And then there's also always a whole test matrix that I write down in JSON that gets rendered into a little table. And that then tells me exactly what command line I have to enter into my scripting so that all of this will be done.
Then I wait, then a browser window pops up, then I click update, then I look if everything works. And Once I've gone through all of these, most usually something between seven to 10 test scenarios, which used to take a whole day and now takes less than an hour, if I'm lucky.
Wow. That's cool. Your automation is very impressive.
It saved me so much time. Every single release, I'm sitting here and have this huge smile because that saved me so much time. Yeah. And I also have a blog post about this test rig.
Does it have pictures?
It has pictures.
I need to find that so we can add it.
I can drop you the link and you can put it in the show notes and something.
Yeah.
And what happens then is at some point I'm through all of this and then I'm happy and stuff. And then I... do the regular release thing. So I just click on release on the GitHub release. I have already filled in the change log on all of that. And what now happens is a whole workflow runs through GitHub actions, which, first of all, runs the whole test suite against everything.
The unit tests are done. The end-to-end tests are done. And only if all of this is green and stuff is actually even released on PyPI and such as well. It triggers the test rig again, because what it will do now is it will automatically build an updated image with the new OctoPrint version, so a new OctoPi version with the new OctoPrint version. All of that will happen in GitHub Actions.
And then when this image is built, then the flash host in my network here at home on my desk will be triggered to download this image, fire it against the Pi, flash it, run the end-to-end tests against it. And if that is green, I get a little email in my inbox that says, hey, this image tested green. Do you want to release it?
And if I then click yes, then it will be released to the wild, basically.
This is like the software engineer dream. You found something that you're interested in. You built it over Christmas break, and then you solved this awesome problem, and then you automated it and solved all these problems to make it efficient. It's so cool. I'm so impressed.
How many core maintainers are on OctoPrint? Is it just you? It's just me. What software were you writing before OctoPrint?
Enterprise Java stuff.
There you go. So you went Java to Python, basically.
Yeah, Python was self-taught, started when I was. So yeah, my career was a bit weird. I started actually working at university because I wanted to do a PhD. And I worked at university. So in Germany, it's like you have some work. Either you are teaching or you are doing something administration. And at the same time, you're working towards your PhD. And I ended up in the administration part
So I was administering the whole department's servers, all of them on really old Unix, not Linux, Unix machines. The mail server was older than me and not really finding much time for my PhD, but automating a lot of stuff back then, even already for the administrative tasks with Python. And then at some point I decided, yeah, okay, so the PhD thing isn't happening.
I'm not getting really enough time to work on that. And to be honest, I was more drawn to doing something like really with my hands and not just writing stuff and having students do the stuff with their hands. So I ended up as a software engineer. in the industry and ended up writing a bunch of software like in Java, IPTV related actually for a big telecommunication company.
And that went on for half a decade. And then I got myself a 3D printer and that arrest this history. So that's so cool.
And you said you've been crowdfunded for eight years now. Yeah.
So eight years ago, you had to make this decision to leave your job and go do... That decision was forced on me because the thing was 10 years ago already, I left this Java job because I was hired by a Spanish company who also was a vendor of 3D printers back then. They found me, they found Octoprint, they liked what I was doing, and they hired me full time to work on that back in 2014.
But then in 2016, they ran out of money and have since also gone under completely as far as I know. So they had to let me go. And now I found myself in the position that I had been doing Octoprint for almost two years at this point full time. Like it had grown a lot, the amount of work that it needed, maintenance work, community and all of that had grown.
But yeah, I was no longer getting paid for it. So it was the decision that I had to do, either try to do it as a side project again, which was an absolute no at this point already, because when I was still doing it as a side project, the first two or so years, that was already bad for my health.
drop it altogether, which was something that I really did not want to do and go back to a regular normal nine to five kind of job or do something that I never thought I would ever do and try to just take the step into the darkness where I did not know at all what was going to happen and try to do this crowdfunded and basically self-employed and
Yeah, I figured if I would not at least try that, I would probably kick myself for the rest of my life and asking myself what could have been. So I jumped into the cold water and did it. And so far it's been working.
I do find it interesting that the commercialized spin wasn't even an option for you there. You could have tried to raise money and say, this is going to be a product. I'm going to make a new business out of it. And you have this open core model, paid plugins, whatever you want to do. So many companies do that.
And that's how they get started because it was a side project or it was something they're interested in. And for you, it was like, I either abandon it or I do it all community. And that's awesome.
Yeah, I'm really not that big of a fan of this whole open core thing. And personally, I also felt like I could not really do that because I forked off of open source software. So the part that talks to your printer was something that I basically took from a slicer of all things because that already was talking to Cura. Cura had a communication part that I could just take over.
A lot of people had contributed. So going like, yeah, I'm going to close this down now and we are only going to keep an open source. It just felt wrong and to this day feels wrong. And I believe in open source and I... find it a bit weird that it's still news for people out there that, yeah, open source in general should be something that should be funded.
We shouldn't have to jump through hoops by selling stuff around it because what we do with maintaining open source is already a full-time job.
Now, I don't know if you can go into details, but where does your funding come from? Is that from like recurring businesses that say, hey, we want to pay for you to... No, that's mostly users.
I have some business sponsorships, but most of the people are really just, yeah, your average Octoprint user who has one or two or something printers and just likes what I'm doing and throws me something between one to five bucks per month. And if you have... a whole lot of people who do that, then this matters.
Do you know how many installs you have, or roughly how many?
Yeah, so I have anonymous usage tracking built into OctoPrint, all of this also self-built, completely GDPR OK-ish, and only on my own servers with my own tech stack and all that. And this is completely opt-in, however. So if people do not say yes, it's okay to track me, then I will never know about the install. But according to that, I have around 150,000 instances out there.
And based on some fun install stats from the PyWheels project, who suddenly saw huge... download spikes on the packages they host for Raspberry Pi. Whenever I pushed out a new update, I know that the number is likely around 10 times higher.
Yeah, I was gonna say 150,000 opted in. Yeah, that is that is usually a very small percentage of people that are like, yes, I will let you get this information. That's awesome.
Which means it's probably like even more people.
Right. Well, yeah. So if you estimate 10 times more, that's 1.5 million. I could see that. That's totally not even out of realm.
The first time that I saw the first numbers come in after the first release with the anonymous user checking, I literally hit under my desk because... That was just, I felt so much responsibility in that moment. And it felt so heavy, literally heavy on my shoulders. I just had this, I just had to hide. So I just sat down under my table and breathed deeply and took a minute.
I hope that your success story, I hope people hear about it because that's so cool that you I feel like you did the like moral right thing that people say that you can't do and still be successful.
And you not only have been successful, but like just as like an engineer, somebody like just people are using something that you made, you know, like tons of people and they like it so much that they like want to pay you for it. That is so cool just to see that many people using your stuff.
Yeah, and it's also, I consider it my life's work. I mean, I don't know if I will do this forever, especially not given the whole open source printer situation that we talked about briefly, because at some point I might just get pushed out of the market by a tendency to locking everything down.
But yeah, it definitely feels like I have done something that actually has helped people, which is not something that I can say about my previous job, I have to say.
Enterprise Java helping people? I don't know. Sorry, Autumn, no shade.
A lot of stuff runs on Java, okay.
A lot of stuff does. When you mix those two words of enterprise and Java, I don't have any good memories.
It's more the enterprise bit also. It's more the enterprise than the Java, for sure. The Java itself was okay. I mean, you can also build good software in that, and you could also build performance software in that, and it's not as slow as people always say. But on the other hand, I also have to say that with Python, everything got even faster. Not in the run speed, but in the development speed.
So much less overhead and...
Well, that's just because your variable names aren't sentence long, right?
It's just... You didn't see the first kind of Python that I wrote when I was writing Java during the day and then at night. So a bunch of stuff is still not in Snake Caps, but in the other one. CamelCase. CamelCase, thank you. Because, yeah, I mean, I was a Java developer.
Going back and forth, I always mess up the for loops in certain things. You can tell I've gone back and forth too many times.
Oh, I can top that. I mean, OctoPrint pretty much is a web application, and the backend is written in Python, but the frontend is JavaScript.
And switching between Python and JavaScript is almost as bad as switching between Python and Java, because I go back to Python, I start putting semicolons behind every single line, and I go from Python to JavaScript, and I just try to start my blocks with columns instead of braces.
Yeah.
And it happens daily. Just yesterday, I can't remember what exactly it was. I just remember that yesterday I was like, no, Gina, this is not Python when I was editing a JavaScript file. I do that all the time.
It's tricky.
Yeah.
So where do you want to bring Octoprint from here? Like, what's the next thing that you would like to do? What is the next sort of like big, like, it's not just, you know, more printers are fine. I mean, I still think that you have influenced that standard of communication by having this early project so long that was be able to, you know, talk to all these printers. You have this plugin system.
What's the next thing you want to do? What's the next cool thing that you're like, I would love if Octoprint could do this.
There is a bunch of stuff that actually needs to be done, which boils down more or less to taking care of some tech stack situations, because I'm still on a very old version of all of the stuff that runs the UI. But because of the plugin system, it's really tricky to update that or to swap that for something new, because all of the UI of all of the plugins out there would suddenly stop working.
And I've spent a lot of time thought into how to approach this and especially how to best get this working. And I'm still in the process of doing this. This is one of the bigger parts that I'm working on. Also for the better part of a decade, actually, I've now been also working on a new communication layer. And that is also a very tricky thing to pull off.
And I also have had really bad luck with it because every time that I actually get on it and get it to a point where I'm almost ready to like, I'm 80% or 90% to something happens. So the first time I ran into a complete and utter problem with my whole approach because of some firmware issues out there that I wasn't aware of. So I had to scrap everything and start anew.
The second time I lost the job and had to go crowdfunding. The third time I ended up in a breakup after over 15 years of a relationship. The third time or fourth time, I don't remember. something like COVID happened. And so I'm almost too scared now to work on that anymore. That's a lot.
It's like this huge project that really needs to get done to make everything more modular and to be able to make it easily adaptable to new developments out there and to possibly also swap the whole communication stack out to target something else than serial communication. like something like network or so.
But the only problem is that it is a project in it of itself, English at this time of the day. And as I already said, I am the only maintainer. So I also have to take care of all the bug fixes, all the security fixes, all the other new features, all of the community management, architecture stuff.
How do you push all the developers and different people that are making the plugins to the next version so you can eventually do an update?
I deprecate stuff, write big, big, nasty warnings into change logs, hope that someone actually reads them, that at some point, some versions later remove the deprecated stuff after it was logging warnings and warnings and warnings to the logs for several months. And if stuff then breaks, plugin developers can suddenly react quite fast, I learned.
Only after it breaks.
Yeah. Nobody listens to the warnings for like five years, ten years.
I had this quite nasty situation that, yeah, Python 2 to Python 3.
That was such a horrible jump, though. Like, it was so bad. It was. It's still going on. This is like...
And I was right in the middle of it because all of the plugins out there were Python 2 only. Octoprint was Python 2 only. And it took a long, long time to get Octoprint up and running. And that was also thanks to a lot of very, very nice contributors who helped there doing a lot of the legwork and then spending half a year or so ironing out all the bugs that were introduced in the process.
uh pushing out blog posts pushing up tools that would help people to move over marking plugins as python 2 or python 3 compatible automatically on the plugin repository by yeah basically by looking at the code automatically and detecting if it would compile under python 3 or not and it was an absolute nightmare but somehow we pulled it off but
That sounds exhausting.
It was exhausting. And 5% of OctoPrint's user base, according to the anonymous user tracking, is still on Python 2. Wow. And at this point, I just have given up trying to motivate them.
They'll never die.
Yeah, I mean, OctoPrint is Python 3 exclusive now since version 1.6, 5? I have no idea, actually. Something like mid-2020 or so. I can't remember exactly.
And there are still people who are left on the Python 2 only version who I redirected to take their updates from somewhere else, just in case there was anything that I still needed to push out, but so far have never done anything and will now also not do because those 5%, they can just like, if a security issue or something like that shows up, they really should just finally do the jump.
Yeah, they need to.
It's like when people try to get, we try to get people off of Java 8, it's like never dying.
Yeah, I can imagine. My knowledge is still stuck on Java 7.
You talked about some things you'd want to make changes in the future. Looking back of more than 10 years of building this project, what do you wish you would have done differently?
I would have done so many architecture decisions differently that are now biting me in my behind over and over again.
How do you... Because a lot of that comes from just learning, either scaling the projects and needs to change over time, or you didn't know how it worked back then and you just learned a new way of doing it now. How would you go back in time and teach yourself, oh, you should do it this way instead? Is there a way?
Do you have a time machine? Apart from that, I mean... I think most of the stuff, if I just had known any better, so if I had found some more information on some things, then yeah, that would have saved me a lot of work. I mean, some of the problems I actually just managed to iron out with the current release because I basically have two web server situations going on.
I have Tornado sitting in there, single-threaded. Async. And on that, I have Flask sitting, which is sync. So that is really a bad idea. You do not want to mix that up. But in 2012, Gina didn't know any better than that. And now I know.
Flask talked a big game at that time. Like, it's not even your fault. No.
The good thing is that I found a solution for that, which means we had huge performance gains in the latest version that I just pushed out now because now I managed to make the whole connection between the two things as well so that they don't block each other anymore. And so...
The whole web page loads faster now, and it's way less likely that some third-party plugin can now block the whole server as well. But these are things that if I had known them back then, if I had just better understood
the kind of stuff that I was working on because I mean I didn't know about 3D printing protocols back then I didn't know about Flask I didn't know about Tornado I didn't know about all of that I was just like okay this might maybe work and if I connect this here and then there and blah and
Then I added a plugin system on top and that made everything way more complicated because now you have an ecosystem. You cannot just rip out parts anymore without destroying parts of the ecosystem in the process. And so that is what is now making things way more complicated.
In your defense though, 3D printing has grown so much in the last decade and releasing software in general has grown so much. You sound extremely knowledgeable about all of these things and I don't know if anyone could learn them as well if you weren't just doing it. You know all these things because you built it and you maintained it and you had to make those hard decisions.
So it seems like you're doing a great job to me. Thank you.
Yeah, I mean, I'm still here, right? So it can't be too bad. And yeah, the things I now know about 3D printing firmware and especially about the differences between the various variations, honestly, I wish I didn't know as much sometimes. There'll be dragons.
The curse of knowledge.
Not just that, but I feel like it's always that struggle of like you learned it at like 2 a.m. because of something went wrong. Because like it went sideways and you had to learn it.
Oh, that's something, by the way, I also learned. I never do releases after Wednesday anymore. Because that gives me Thursday end, even though it's usually my day off because I'm on a four-day work week. If push comes to shove, it gives me Friday and it doesn't ruin my whole weekend. I did a bunch of releases on Fridays and it cost me one too many weekends.
Never pushed a prod on Friday.
Yeah. That, that is the real wisdom of this podcast right now is like, people say like, don't push on Friday. And you're like, no, no, don't push after Wednesday. Like if you're, if you're pushing on Thursday or Friday, you're just asking for it.
That is the perfect big time to get like.
Someone else to try and then call you. And that is like, they need a day.
There's no testing real users wanting to use your software in a way that you never imagined. Oh, yeah. I know. That's why I think you obviously do as much testing as you can, but getting real people to try it the way that you said that you do, like that release where people can try your other branches so they can bake properly. I feel like that needs to be a shirt, Autumn.
It's like test with users.
I mean, there's nothing like, it is nothing like some real person being like, I wonder what you could do if I put this here. And you're like, why would you do that?
Or they have like some crazy workflow where you're just like, what? Like, would you do what? Like, oh yeah, no, I drop to the web console every time and I type my commands manually in JavaScript. You're like,
They're like, but I want to use the UI and the CLI and then do this. So then you're just like, but why?
Why would you do that? But you know you have produced some stable software if after a huge new point release, not point release, a minor release, only such stuff comes in.
It's only the weird use cases.
And this time, I can say that I managed to do that. I got only really, really weird, really odd stuff.
That's an achievement.
Right? I thought so as well.
Not just that, but the fact that you automated all that by yourself and you were the main maintainer, you are amazing. Like, amazing.
You need to keep in mind, I automated that because I am the only maintainer so that I had more time to do the maintenance.
Yeah, but you still had to do the automation. I know it makes your life easier, but like sometimes like you will sit there and it takes longer to automate stuff than like, I mean, you get it back obviously after a while, but like.
Well, not always, right? I mean, you can spend the whole week automating something that you do once a year. And in this case, you're like, oh no, this, this taste went from a day to an hour is a good use of automation. Yeah.
Because we've all automated something and we were like, this is going to be great. And then it takes longer to automate than it does to do it manually. You're like, why? Why did I do this to myself? Like eight hours in. I'm into home automation, so I have this a lot. Yeah. I love that stuff. Me too. Like, but I'm just like, there's certain things that I'm just like, that was such a bad idea.
But like, you'll never know until you do it.
The good thing is, you often still learn something new in the process. So even if it's all for the... That's what I'm saying.
Just listening to you talk about it, I'm like, man, your knowledge is just insane. You must just know the ins and outs of so much of this because of the way that you're like, and then I had this problem and then I found this awesome way to fix it. And I'm like, how did you do this by yourself? That is amazing. Okay, but what do you print at home?
Did you make your own 3D printer or do you have like... No, I actually always just get something from the shelf basically and So what's your favorite 3D printer?
I'm not sure if I would call it a favorite. I have a very old Prusa Mark III by now that I have modified a whole lot. And that works and works and works and works. And I actually just printed a guitar with it that I gave away as a birthday present to the father of my partner. who was really, really happy about that.
Do you have anywhere that you post the stuff that you 3D print? Because I just want to follow all the stuff that you print, because it has to be awesome.
Sometimes on Mastodon, sometimes on Printables, but mostly probably on Mastodon. So chaos.social slash adfusel. And that's also where I post everything pretty much that I make. Currently, I'm more into making print and play board games for some reason that just
suddenly started oh that's cool i just made a card game again this morning so i yeah it's a weird thing because i feel like at the amount of three like you were 3d printing when like it wasn't even like a big hobby you know and the fact that you have like you created all this software i'm like you have to be making cool things
Mostly functional stuff, I have to say. So I printed some parts for my bike, like for mounting the two locks that I have to the frame and for mounting the radar unit that I have to tell me when a car is coming from back and such. Stuff like that. Then together with a buddy, we did a whole project for the Chaos Communication Congress and the Chaos Communication Camp last year.
which were basically little environment sensors that we put into little gnome figures. And I printed all of these gnome figures.
You are like the human problem solver. That is actually my superpower. How many problems has she talked about that she solved? You know what I mean? She's the epitome of engineering brain. She's like, I had this problem that I made. I'm just like, I just want to be your friend.
You are amazing. Yeah. You're just like, and then I solve this automation problem. And then I realized we needed this.
And I'm just like, you make all the things. This is actually the reason why I got a 3D printer. Because I had all of these ideas constantly how to solve certain issues in a household, like just around the home. But I never had a way to do that. And then I got a 3D printer and suddenly everything looked like a nail for my new hammer. And then later I got a...
i got a laser cutter and then i i got a new cutter and can we just talk about you should be gina fusel the problem solver yeah but like you gotta add to like that like part of its official title now i love it yes that is actually one of my best skills here that is something that also back when i was still uh
still a java engineer person was constantly you're always gonna have problems and always end up with like you know adversities but just the fact that your attitude is like okay we have this problem and we're gonna fix it this way like that is amazing you are gonna be successful for it the only downside of it is that sometimes my brain won't shut up
Because then it is, you know, like when you're lying in bed and you're trying to sleep and your brain is going, oh, by the way, you might be able to solve this that way or you could do this and such. So I'm now listening to audiobooks so that I can actually fall asleep because otherwise this stupid thing just won't shut up.
But then the audiobook gets good. I live that problem all the time.
I have a trick up my sleeve. I only listen to audiobooks I have already read.
so i know what happens so i can't solve that problem too because i'm like i have the same brain i feel like it doesn't do the same cool problem solving that you do like i'm trying to get on your level one day like i'm not there but like oh it's always like and then this and then you should do this and then you need to make a list for this and i'm like can you shut up i'm trying to sleep
But then I'm like, oh, the book just got good. I just give it something to listen to and then it shuts up. And because I already know it, I get tired and I sleep. It doesn't work with podcasts. It doesn't work with books I don't already know because then I want to actually...
you know, listen and know what happens, but Gina has all the secrets guys, all the secrets.
This has been a fantastic conversation and thank you so much for coming and sharing all about OctoPrint and what you do for anyone that's listening. If you're not familiar, if you have a 3d printer, go check it out, run it on a Raspberry Pi three, donate to the project. Cause this is one of those successful open source projects that has been around for a while. I'm, I was a user for a long time.
I am also a donator. So I, you know, Encourage everyone else to go out there. And it's great having an integrated GitHub sponsorships and all those things that you have for the project make it really easy. Just say like, oh yeah, here's $10, here's a recurring buck or two.
All those things go a long way to help promote the work and really promote the idea behind successful open source that can be community run and community funded is an awesome success story.
Yes, I hope that people take the success story and it proves to them that this can be a model for open source.
possible. Thank you so much, Gina.
Thank you for having me.
It was a blast.
I hope you enjoyed those flavors of Ship It. Yes, we love that show. Justin, Autumn, they do an amazing job hosting that show, and we're so proud of them. If you're not subscribed, go to shipit.show right now or search for Ship It in your favorite podcast app and subscribe.
Later this week on Friends, we are talking to Suze Hinton, cybersecurity, white hat hacking, Kali Linux, 3D printing, flying an airplane, all the fun. It was so awesome catching up with Suze. I think you'll enjoy it. Okay, a massive thank you to our sponsors for this episode, Speakeasy. Check them out, speakeasy.com. A brand new domain name, speakeasy.com. Generate enterprise-grade APIs.
And our friends over at Supabase. Launch week number 12 is under wraps, but you can check it out and learn more about all that they launched. Check them out, supabase.com slash launch week. And our friends over at Test Double, check them out, testdouble.com. Their mission is to improve the way the world builds software, and they're doing awesome. You should check them out.
And to one of our newest sponsors, getunblocked.com. Unblocked helps developers to find the answers they need to get their job done for all the hows, the whys, and WTFs. I tried it out. I loved it. It's amazing. Check them out, getunblocked.com. And last but not least, to our amazing friends and partners over at Fly, check them out, fly.io. Check out their GPUs.
They have GPUs in place now that you can run. You can run your own Ollama in the cloud on Fly. Check them out, fly.io.gpu. And those beats from Breakmaster, Breakmaster Cylinder, bringing the beats. Love them, love them. Hey, that's it. The show's done. Thanks for tuning in. We'll see you on Friday.