
Kyle d'Oliveira (Clio) shares his survival tips for dealing with tens of thousands of commits, massive migrations and the very limits of databases. We discuss the lessons learned from Rails megaprojects and how to use these tips in your own projects to reduce technical debt and tools to keep your monolith majestic when the code won't stop coming.LinksGitHub's Online Schema Migrations for MySQLGh-ost benchmark against pt-online-schema-change performancePicksMatt - Danger JSLuke - From jQuery to ES6 | Drifting RubyDave - Titan Security KeyDave - Teach, Learn, and Make with Raspberry PiBecome a supporter of this podcast: https://www.spreaker.com/podcast/ruby-rogues--6102073/support.
Hi, everyone. Welcome to another episode of Ruby Rogues. I'm David Kimura. And today on our panel, we have Matt Smith. Hello. Luke Sutters.
Hi.
And we have a special guest, Kyle D'Oliveira. Did I say that right?
It's D'Oliveira.
D'Oliveira.
Yeah.
So Kyle, would you mind telling us a bit about who you are, who you work for, and some of the things that you're doing?
Sure. My name is Kyle. I've been working for a company named Clio. That's a legal practice management SaaS software. It's based out of Vancouver, Canada. It makes... Practice management software aimed at lawyers. We're looking at transforming the legal space. Our mission is to transform the practice of law for good. There's a nice little double entendre there.
And it's been really interesting seeing some of the changes in legal that we've kind of made an impact with over the last few years. I've been working on Ruby and Rails for the better part of the last decade, but when I started working on Rails, it was Rails version zero, and I've been upgrading Rails ever since, and so now finally up to Rails 6, and...
So touching all of the major versions, my major focus at Clio, which I've been at now for eight years, has been on the backend infrastructure side of things. So the main focus is scalability for the code base, but also in the terms of the organization, like what happens when we have 200 developers working? What happens when the data set sizes are, you know,
to the size where we can exhaust regular integers and we need to actually go into like beacons. We look at approachability, how easy can we just take a new developer and dump them into the code base and have them up and running?
Because as things go to scale, there's obviously new patterns that need to be adhered to that we don't necessarily need to focus on with small projects, but we do need to focus on for large projects. And my team has focused a lot of that to making the effort and experience for all of the developers easy and fast.
Yeah, absolutely. One thing that kind of rings true is you always have to think about scalability when you're developing, but don't actually write for scalability when you're developing. So keep it in the back of your head saying, is this going to come back and bite me later? Or is it a really non-issue?
I remember one time I had a situation where I was storing just three kilobytes of data in a database. And I thought, okay, this is going to get used a little bit. They were images. So you can kind of see where this is going. I'm like, That's not a big deal. It's only three kilobytes. But unexpectedly, the consumers loved the feature that it was supporting.
And now that single table is over 30 gigabytes. And it has millions upon millions of records. I'm like, oh, that was unexpected. But I guess that's kind of where I did not think of scale at the time. or proper way. So introducing that kind of technical debt kind of painted us in a corner because now transitioning away from that model is going to be a pain when you're dealing with that much data.
Yeah, absolutely. It's hard to know what you don't know. And so if you don't think about the scale at that point in time, it's hard to know what problems you're even going to run into.
So you gave a talk last year about death by 1,000 commits. Could you give us a high-level overview of that talk and kind of some of the things it entails?
Yeah. So working at Clio, the code base is quite large. We have tens of thousands of commits that we go through. And it's really easy to see patterns of developers working on features. The features go live. And at some point in the next six months, a year, those features come back to bite us. So as like the first commit is great, the 10th commit is great. You're starting to notice some things.
By the 100th, there's maybe some problems. And by the thousands, thousands commit on it, right? You've stopped because now you have to completely refactor and rebuild a lot of this technical debt that you introduced. So my talk was talking about some of the lessons that we've learned.
And although the lessons are very specific to specific problems, there's kind of a generalized idea of what some approaches that you can take to dealing with technical debt in your own projects. If you are able to, for instance, keep if you're able to automate technical debt away entirely, well, there's a whole classification of problems you no longer need to think about.
And you can feel confident that those are just automatically protected. And if you are cleaning up after yourself as you go and making it easier when there are curveballs being thrown at you, fixing technical debt and dealing with it when you hit scale doesn't have to stop you entirely. It just becomes a constant small tax that you pay.
But if you invest in the tools, you can actually start moving faster even as you scale.
Right. And so would you mind also explaining what technical debt is? What would you consider technical debt? And what are some things that you would maybe not consider technical debt? Kind of like some debusting myths about technical debt.
I would say technical debt is like accumulation of decisions that are made while coding that you eventually need to correct in the future. And as developers, I think we're always making these decisions. Can we cut a corner here to deliver a feature out a little bit early? And I think those are like technical debt isn't bad.
I think when you are willing to get something in front of the users and deliver value earlier by incurring a little bit of this technical debt that you then have to clean up, I think that's totally okay. But I think technical debt often comes in the situation of Developers making a decision that a framework needs to be super generic and it's a little bit speculative.
And then they come to implement something in the future and it's just really difficult to deal with because it's so generic and hard to understand that new developers have to then unpack that and wind it back just to implement something new in it. Some things that I think are not necessarily technical debt can kind of come from...
maybe decisions that actually made sense at the time and aren't necessarily any cutting a corner. So, I mean, it may make sense to build a system that is very generic and maybe that is the correct choice and you build through and then things change. And when things change, that's when you might have to have the technology that comes back. But until the things change, it actually might not be.
I think that's a bit of like a generic answer, but it's hard to pin down a concept like technical debt because almost everything we write is debt of some form.
Mm-hmm. Yeah, I definitely have to agree with that. So what are some of the real world examples that you guys have experienced over your years where at the time you made a decision and you or the team thought like, this was a great choice. This is the right way to do it. But then later you found that it became more troublesome or more of a headache than it was worth.
One of the things that popped up is actually something that we decided on because the Rails community pushes for, and this is what comes out of the box.
So if you think about Rails migrations, if you think about how they're often applied, if you think about some examples that you've worked on, there are often times where you use something like a tool like Capistrano, which deploys some code, and as part of the code, database migration gets run. And for Projects, That's fine.
That's like a for most small things like that migration that runs is fast and it's not a problem. But so this is an example of a decision that we kind of were like, let's just inherit what the community uses. But as we started scaling out, we started encountering problems with it. So, for instance, a table that if you ran a migration on it took 30 minutes.
This means that our deployment took 30 minutes and also timed out. So we lost all of the context of it. But also during this period of time, the table locked. So any developer or any queries that started going to that table stopped being answered. So all of our servers shut down. And we couldn't kill the alter table because it was already mid-progress. And...
After it finished, we now had a table with a new state, but the code hadn't actually finished deploying. So now we're running into different problems. So this is a little bit of a decision that makes a lot of sense when you're small. Go really quick because you can, and it makes sense. But when you hit a certain piece of scale...
you can no longer run with those assumptions and you need to change those. So a new process needs to be built. And for database migrations, we need to build them in a way that are like entirely asynchronous to a deployment process.
30 minutes is a, that's quite a migration.
Yeah, I think this table that we run uses a little bit of all of the activity that users do. And it was like the first table we ran into that it exhausted like 32-bit integers and we needed to flip the IDs to be big ins. We didn't think that would be a problem either. And it's leaps and bounds bigger than any of the other tables we have in our system.
I'm going to ask the obvious question now, which is, how do you make your system capable of asynchronous table migrations?
There's actually this good question, and there's actually a lot of tools that exist that we don't necessarily need to build ourselves. GitHub has a tool called Ghost. There's another tool by Percona. It's in the Percona toolkit. I can't remember. It's like maybe online schema replacement. I can't remember the exact name.
The general strategy is to, instead of changing a table with like an alter table, you actually create a brand new table, populate that table with various mechanisms. Some of them use triggers, some of them use the binary logs, get the table to like a table that's in sync, and then do quick renames.
And so you rename the table to be the old one to be old, you change the new table to be the new one, and then new queries start flowing into this new table. And you can do this what As long as you want, it's entirely non-blocking, but it has to be in a process that exists entirely outside of the deployment stack.
And that could have its own issues if you have thousands of requests per second coming in. So yeah, definitely not a fun problem to solve. And it's also, I guess, good to know what kind of migration or really what kind of SQL functions will cause a table lock. So adding an index or adding a column and stuff can lock your table.
So being aware of what actually is going to lock the table is really good information to know.
Some of them seem obvious. Like I think if you're dropping a column or adding a column that could potentially lock, but some of them are not. Like if you changed a Varchar from like a Varchar 100 to Varchar 200 and you're just increasing it, does that lock? Maybe. I actually don't know off the top of my head. What if you change the character set? What if you change the coalition? I don't know.
Is this on MySQL or Postgres? This was in, we used Percona, which is just an offshoot of MySQL. So it'll also be different for between databases. So Percona might have different decisions.
Shout out to Percona guys. I've done some, worked in a place where we had some Percona consultancy. They were really good, really delivered.
So that kind of covers the database and schema side of things. To step away from the code, you had mentioned about onboarding people. With a larger client base, what does that process look like for you guys? And how do you really bring a junior or mid-developer into the company and have them productive quickly?
Yeah. So a lot of this comes from tooling and education, right? There's as like senior developers or people who have just different experience from different places, we've accumulated huge amounts of knowledge and it's kind of all tribal.
And I think the, if you join a company that doesn't have a great strategy, a lot of the strategies for sharing that knowledge is like just work together, go submit pull requests and have them code review it and learn from the code review. And I think that's okay. You can learn that way. But there are better ways to push information to people. And this is a concept about just-in-time education.
So an interesting example of this can be through the linters. So I did a talk about this as well for the 2020 Couch Edition of RailsConf called Communicating with Cops. that focused on using RuboCop as a mechanism to provide education. I did a little bit of deep dive into how RuboCop works and how to build your own custom comp.
But one of the things that we approach with at Clio is as people make mistakes and learn about bad patterns, we try to codify those patterns so that it's... It doesn't happen again, but people get education about it right as it happens.
A good example of this that is super-trivial and doesn't often bite people until there's just an unexpected case would be maybe the Rails convention of naming files. We've seen cases where people maybe make a user's model, but then make a typo in the spec. So rather than call it user spec, call it users, and it's plural, or something along those lines.
And the spec will still run, but there might be some tooling that we expect to adhere to the Rails convention and it doesn't quite line up. So you can have a linter that basically checks the name of the files and the name of the classes and make sure that they're in line. And if not, alert people and do that as part of their editor or do that as part of them committing code.
And they get warnings and they get education as they're writing code. So if they just wrote something, they save the file, they get a little warning popped up being like, hey, you may have made a typo here. And this goes even to as far as behavior.
If we know that there exists bad patterns, so for instance, making an HTTP call inside of a transaction, which we know is going to be potentially bad, we can actually automatically prevent that. And as soon as that starts happening, as soon as we're able to detect it, so it might be in a test,
might be as part of a linter, we provide that education right back to the developer so that they understand what they did wrong and the avenues of what they need to do to fix it. So now when a junior developer enters the company, they can actually just feel free to start writing code and write even code in kind of a way that maybe breaks some patterns.
And a lot of times they're going to start getting education right away. And then we can do all of the usual things as well as as pull requests come in, we can review them and provide more education that way. And if we find constant patterns of every junior developer we come in makes the same mistake, let's codify that so that they get the feedback immediately.
Yeah, that's kind of one of my pet peeves, I guess you could say, with linting is that if a particular project has a set of practices it likes to follow, maybe it is no more than 100 characters on a line. That kind of feedback should never happen in a code review. If you have those kind of expectations, then they need to be known expectations via a linter, whether it's RuboCop or standard RB.
And it should never be an unknown expectation to the developer. So I'm definitely on board with that. And that's something that I've had to fight and struggle with is going through code reviews and having everything kind of nitpicked. Because when it decreases the morale of the developer, if every pull request they're making, it's just getting bombarded with styling quirks or requests to change.
So I could definitely... agree with that point. And I think that every project should adopt some kind of linter if there are expectations of what they're doing. Even if you bring in RuboCop, you disable everything by default and then you just start adding in or allowing which exceptions your team follows on that particular project.
Yeah, absolutely. And I think there's even one step farther of a lot of linters can do auto-correcting. So if you care about having one line space between methods, don't even have RuboCop or a linter warn about that, just auto-fix it. That's something that a developer just doesn't need to worry about.
And it also removes a lot of this argument over should I use double quotes, should I use single quotes? If it just automatically fixes and developer can write whatever that they want, that's fine. But I've also run into issues of having pull requests being bombarded by style, and it really distracts from the code review about the behavior.
Yeah, absolutely. Although you do have to be careful about the auto-correction. I remember one time in my earlier days of development when RubyMind came out, I tried out RubyMind's code refactoring thing. I forget what they call it. But I had some really poorly written classes, and it just absolutely broke everything.
I have no idea how that happened, but things just were not working the way they were before. I had to pull that merge back out because, of course, as an early developer, I didn't have any tests on the application. So I didn't really notice that things were broken until they got deployed.
Yeah, definitely need to be careful there.
So you also previously mentioned about, so not necessarily onboarding developers, but having a lot of developers work on the project. So what point do you go from a small shop to a large shop where you have to start putting different kinds of practices in place? And what are those kinds of practices when you're dealing with a lot of developers on a single code base?
So actually, it's not clear where that point exists. I think it's probably going to be different for every organization and probably different for exactly the work that you're running into. I think the thing is to listen to the pain points of the developers. So if you notice that there are, you know, there's pieces of friction that occur between developers, like...
That's the point where maybe there's actually some tools that need to be built to make this easier. So one thing that I think comes up really quickly in organizations is often the concept of a testing server. So you've got your developer's environment, you've got maybe your CI, but maybe you want a production-like environment for things. And so you have a staging server.
When there's five developers, it's really easy to just coordinate and be like, oh, staging is mine now. I'm going to test something. When it's done, I will hand it off and maybe reset it back to whatever the master branch and let people work that way. But that really falls apart when you have 100 developers.
How do you coordinate one server where everyone is trying to test something if you have 100 developers fighting for that resource? You can kind of... fudge it a little bit by maybe having a fixed number and you round robin them out. But again, at some point, that's going to break down.
So if you think about what's the problem here is that every developer wants to potentially test something on an asynchronous schedule Maybe it actually makes sense to build some tooling so that you can spin up staging servers on Amazon EC2 or on Google on demand and just route them there.
And so that's something that we ended up having to do really early of building our own tooling so that we can, we call them beta environments where we can have arbitrary number of them. Someone spent the effort to basically say like this branch on GitHub, I want a clone of the site.
on amazon and within like 10 minutes you've got a domain that points to it you've got the full stack you can you have full control you can do whatever you want you can break it and it gives developers a lot of autonomy to test things that they want and you know removes a lot of this oh let's deploy it and see what happens you have a full environment that you have full control over go test it go see it with as much data as you want and then see what happens
Another example kind of along those things is like deployments. Do you have a handful of senior developers who can deploy or do you deploy on like a every Monday you do a big deployment? Like that's going to start really breaking down when you have a lot of developers. You know, at Clio, we everyone has the ability to deploy. Everyone has the ability to merge code.
So we give the power to the developers and now, you know, A junior developer can come in, write a fix to a readme, merge the code, deploy it without having to really bother people outside of getting a code review. And now we're deploying code probably upwards of 30-ish times a day, and that number is just only going to go up. And...
So as we're running into these issues, we are just looking at what can we do to build tooling so that it's no longer frustrating for developers. And the important part of this is developers need to voice things and managers and companies need to listen. If we're wasting five hours a week per developer on this one thing that's frustrating, build tooling around it.
Yeah, that's one of the things that I did just for my own hobby project and just continual learning. I have a self-hosted GitLab instance, and I set up a Kubernetes server, which will automatically create the infrastructure for the application that got pushed. So it always happens on any kind of development or master branch push, and then also on each commit.
up to the repository, and it'll spin up an entire infrastructure within Kubernetes with an FQDN that that feature can then be tested. So it works on smaller applications.
I don't know how it would work on applications that consume 30 gigs of RAM of resources, but I think on smaller applications, that kind of thing can really save you from having to have dedicated test servers that's shared by several people.
When are you going to do an episode on that, Dave?
I do have a Judgement Ruby episode on Kubernetes, which that's where I got the inspiration from on that episode. I just didn't tie it into the CICD portion.
I've got a question for you, Kyle. It sounds like you've got a lot of data if you're running 30-minute migrations, and you've got a lot of developers, and you've got good testing, good infrastructure. What I've found is a lot of real...
memorable problems i've had is where you get something running and it feels like it's going to be fine but then it gets deployed to the master database and that's the point at which there's some bad data in there there's something in there from ages ago from a previous version and it absolutely sinks you and these days whenever i possibly can i just pull the entire production database out and test against that
Do you do that? Or is your database just so huge that you kind of throw it around? You can't do that, especially with a lot of developers.
It used to be something that we did. We used to call it the snapshot, and you could point environments at the snapshot and run test queries on it. But we actually did hit a size where the time it took to set up the snapshot every day was taking longer than it would take to actually...
back it up so we it was just starting to become unfeasible for us and we're also dealing with sensitive data and we don't necessarily want to give free access to all of that that data for our our clients so we instead try to invest in a little bit of tooling we we definitely still have issues where everything looks good in development everything looks good in like beta or test and we deploy to production something is wrong so we think about what can we do to make make that better
And so we, you know, if it's about a lack of index on like a database query or something like that, we can try to check that ahead of time and build some tooling and alert people when something goes wrong. But also in production, we can be and say like, hey, this query took 30 minutes, that's unacceptable. This query took five minutes.
And return that information as like an exception to the developers that they need to fix, but without interrupting the actual request behavior. And if things go really south, just roll it back. Like we don't really have, it's not a blame if someone deploys something and goes south and they quickly roll it back. We just try to take that as a learning opportunity.
And how can we take that learning opportunity and share it to everybody so that everyone learns from it? Does that answer your question?
Yeah, I mean, you must be dealing with a lot of data. And I mean, I've worked with, you call it HIPAA data in the States, where it's kind of confidential data. And that hugely complicates testing data transfers because you have to either heavily anonymize or write your own tools, kind of replicate a few hundred thousand medical records.
Yeah. What we can also do is, I mentioned earlier that we could talk. We have these data environments that we can spin up. You just use a SQL dump to store data in there. And although this isn't necessarily production data, developers have full control over what that data looks like.
And so if we wanted to see what happens if there is tens of thousands of something in a table or more, we could just build little scripts that can seed that database and then test it outside of production.
It's not perfect because it doesn't always match the same shape as production, but it's an iterative process and that information gets codified so you can keep adding to the seeds in those manner so that it becomes a better and better representation as we go forward.
Yeah, so kind of back to the technical debt, I have an unfortunate story of something that I inherited one time where I think metaprogramming is awesome and can do a lot of really cool things and can really get you out of a bind in certain situations. But then it can also be overly abused. And
I was searching for a function that was not working properly within Ruby, and I couldn't find it in the code base at all. So I thought, okay, well, surely that this is in the gem or something. So I started looking at all the gems that's included into this Rails application, started tearing apart the gems, opening them to search for this function. Still couldn't find it.
Turns out they were doing a class eval on something that's pulled from the database. So they actually stored Ruby functions as column or data within a column on the database. And that's what was getting executed. That's where the function was defined. So to me, that's a, what's that?
What's wrong with that?
Yeah, so other than you could not possibly even test that bit of code with any kind of reason, but it was a nightmare.
So just a warning to when you think that you're doing something really cool and elegant that's avoiding code duplication or whatever, I would much rather have code duplication all across my application than having that level of obfuscation where you're never going to be able to even remotely troubleshoot it.
Yeah, metaprogramming is actually one of the best strengths of Ruby. You can do so much with it, but once you have it, it's the hammer and everything is a nail and you want to use it. And that's often a trap that new developers, when they learn about metaprogramming, they really want to go into.
I think a good lesson to come out of that story is that if you think about code, it's written once but read countless times. And so if you can take little things to optimize the code for the reader, that is much better than sacrificing readability to optimize for the writer.
So if it takes you an extra 30 minutes to write a whole bunch of cookie cutter methods, but now those methods are in place and they're static and it's easy to read and reason about and test, that is well worth that 30 minutes because you're going to lose more than that reading that piece of code in the future.
Yeah, absolutely. And it could even be taken to something like private methods, where if you have a class which has a bunch of methods, start sorting them out, which ones are private methods so they do not need to be accessible to the consumer. Because I've had situations where I've worked on a class that grew over a thousand lines and And there were hundreds of methods in there.
And I had no idea which ones were publicly accessible that were truly supposed to be publicly accessible and which ones were really meant to be private. So not having that level of abstraction, so to speak, you lose a lot of visibility in how important is this class to the consumer.
Yeah, absolutely. Anything that you can do to make those kind of classes easier to understand and read for a new person is great. And also just backing up a little bit to your example, this is also an instance where metaprogramming bit you, but metaprogramming is also interesting that it could save you because you can also ask Ruby about Ruby.
So if anyone didn't know, this is a tactic that I use all the time for debugging pieces of code that I've never been familiar with. If you can have access to a console, you can ask Ruby what methods are available with a .methods call. You can also get access to the method itself and then ask it, what is its source? Where does it live?
make life easier to track down methods that may be dynamic or created by gems.
I recently learned how to use the LS command in Pry. And now I just live out of the LS Pry command. The Ruby API traffic's dropped off considerably. I find the dot methods to be quite noisy. This is very verbose if you're kind of trying to pick out which command it is. And I really like the Pry LS command.
One thing you can do to make that less noisy is take object.new and subtract the methods out of that and sort it and all that sort of stuff. And you can do it all in a one-liner because we're in Ruby. But yeah, LS is another great option.
My documentation suffered for it, I must admit. Now my attitude is just, oh, they can just aless the class and see what's going on, man.
I think that's another example of someone making some tooling that makes something that, yeah, if you knew to call .methods and subtract object.new.methods or object.methods, it's great, but now it's two characters, and it's nice and easy, and it's much more approachable, and you can have access to things that you may not knew existed.
Can I ask you about, can we turn back the clock and ask you about Rails 0? Oh, it's been a long, long time since I've worked on Rails 0. I can try to answer questions, but... So it sounds like you've been on a bit of a journey with scaling things up. What did you do before Rails 0?
Oh, most of my career has been working with Rails. So before Rails 0, I was working at an enterprise Java shop that I don't remember a lot of the details of it anymore. It's kind of too far in the past. But I think I've been working with Rails now for 11 years, I think. So it's been just a long time with just Rails. I don't remember a lot of the pre-Rails world, to be honest.
That is the correct answer. There is no other system. I ask because we were talking about the N plus one queries. And my complaint is that Rails makes it too easy to do n plus one queries, because if you just kind of follow all the guides, that's what you get. If you kind of do a dot all to each, then you're going to be there for a while.
And you start noticing that when you start getting into a few thousand objects. So you can be sitting there prototyping something and think, this is great. And then when people start using it, you drop it in. That's when you start hitting these gotchas. But I think people forget that. what the bad old days were before you had the Rails tooling.
The amount of time it took when you had to write your own queries was really quite significant. And you mentioned enterprise Java. There's not a whole lot of object relation mapping going on in that. So it is a double-edged sword. When you're operating at the scale you do, what are the parts of Rails that start to bite?
We've definitely been bitten by how easy it's been to make N plus one queries in the past. I think pretty much any Rails shop is going to be doing it. Rails offers tooling to help with that, but the tooling still requires a lot of effort. You have to kind of know what N plus one query you're introducing and fix it.
So though that's where you can build some more tooling, there exists a gem that we built, the JIP preloader. There's also another community gem called Goldiloder that removes stuff like N plus one queries. And those are ways to basically eliminate those kind of problems. Some other things that kind of come up on Rails as we are building is discoverability of templates.
So I think one of the previous episodes of Ruby Rogues was talking about this. But as it scales up, Rails ERB makes it really easy to render partials all over the place. But it's really hard to understand, if you're looking at a page, where are those partials actually coming from? And how can you dig back into them? So that's a challenging thing with Rails as well.
There's also some things with the community for things like paging that can be problematic at scale. If you look at what some of the basic gems that offer, it often comes down to a limit offset, which
is also really fine on small datasets, but as you get to datasets that are really, really large and you're going to page really deep into them, it actually starts really falling apart and breaking down and things that you might not know until you actually just hit that scale.
I think some of the Rails conventions also starts becoming a little bit problematic, and you see a little bit of discussion about this. Rails at one point said, throw all the logic into the controller. And then eventually the controllers became skinny and all the models became really fat.
And I'm sure everyone has that God object that exists in their project, the user object or the account object that is 5,000 lines and really difficult to reason about. And people are offering opinions of having like service classes or various different patterns to try to combat that. But we're still trying to unpack some of the things that you started the Rails projects with.
One question on that, as far as how you've seen and the progression of the companies you've worked at, documentation, right? Like on the one hand, we've just talked about, you can use cops, you can use linters and say, go out and, try things, break things, autocorrect things, experiment, basically.
Then there's self-documentation, making sure you're writing good method names, good class names that are intuitive. And then there's inline documentation. And then there's high-level documentation of, hey, we're using this, set some conventions and everything else.
This is a big question, but what do you think is the right thing to put in each of those buckets in order to make an intuitive project that scales across more than 20 developers up to 100 developers?
Yeah, and here's a little bit of my thoughts from it, but I'm not going to say my thoughts here are perfect. I think everyone's mileage will vary because documentation is a tricky thing. So when you get to... If you're getting to... gotchas. If you ever tell someone, oh, if you see this pattern, don't do it.
If you have code reviews that are like, oh, I've been bitten by this before, that should be something that falls into the linting or the just-in-time education where you try to codify that.
If you see people that have inline comments in code that says, you know, like, this next few lines are going to iterate over something and do these operations, that's probably an indication that their code is not written well to describe it, and that comment is not super valuable.
So that actually might be something of like, that comment shouldn't exist, and instead we should maybe extract a method that describes it better and kind of move in the direction of code describing itself. When you are implementing something that's specifically tied to code, it should probably exist at the code level.
So if you have a module that you want things to include and developers need to implement certain methods in there, maybe the module should... define those methods and raise a not implemented error that have a very clear, this is what this method should do, this is what it should return, here are some examples, and just link to them in your own code base.
And so now when a developer looks at that specific piece of code, it's still tied to the code base. But all of that's you know, at the code base level, there still needs to be something at like a higher level. That's like a readme in the documentation or in something else entirely.
So we have stuff that exists in a readme that's kind of more about like process, but process is specifically related to our code base. So a good example of this would be, how do you do these asynchronous migrations? So this isn't really super tied to code because you might make a migration, but then what's the process of getting that live? So we have a step-by-step guide for at Clio.
If you want to do a migration, here are the steps that you need to take. And as much as we can, we just link back to code rather than re-implement the code, but we'll also just describe things in English and offer templates there. And then we go one level higher to things that exist more at like a process level for the organization. So for that, we use a tool called Confluence.
There's lots of tools that exist that kind of do similar things. But for those, that's things that exist outside of the code base. So if an incident happened, how do you do a postmortem or root cause analysis on that? And there'll be documents for that.
Or, you know, if you wanted to propose a new style of new feature that you wanted to get some buy-in using some new architecture, just wanted to make sure that the approach is correct. You can do like a design doc in Confluence and get people kind of bought in well before you've actually written the code. But once the code is written, that document is less relevant.
Absolutely. I was kind of going from the standpoint of like we were talking about bringing a new developer in and getting them used to the whole environment. And you've definitely tackled some of that in terms of, you know, here's the process migration example there. What about just getting them used to the entire structure?
structure of your application, where certain logics live, certain design paradigms that you've talked about. Some of those can be encapsulated in linters, but some of them are larger than linters. And so is that when you're doing the specific guide for walking them through that process?
Yeah, so there's definitely things that linters aren't going to be able to do. A linter won't be able to tell whether this thing should be a model or a service class or something. It's not really going to be able to... It doesn't understand the business logic of it. So for things like that, we kind of have to rely on little handbooks of being like, here, we've codified our style guide.
And we try to make sure that we keep that up to date. There are some things that we still teach through kind of tribal knowledge and code reviews. Like if someone submits a pull request and we notice it, we'll still correct it there and we'll do a lot of pairing. So we'll get developers up to speed by working with people as opposed to just going off on their own.
But I think this is just a learning process. Like we, I don't think we are perfect at getting developers onboarded and I don't think anyone is. And I think that's the important distinction. It's an iterative process.
If you bring in three developers and they all have the same issue, that's probably when you might need to introduce some new documentation and be like, hey, here's our new developer handbook. You might want to read it.
And then absolutely. And then on top of that, you have personalities too. And, you know, certain people gravitate towards certain things. How, what's your methods to when you have what I would consider external documentation, whether that's living in a readme, it's not in a Ruby file or a,
html file or something like that how do you guys have any triggers in order to hey if something happens over here and we decide on a new paradigm make sure you go update that guide documentation or is it oh just like we brought a new person in and we've got this new convention that's not documented oh geez we got to go update that documentation and it's kind of a only when you discover it type of issue
So I think the answer is both. I definitely think we still have places where our documentation drifts and then somebody notices and we're like, oh shit, we got to fix that.
But we also do leverage tools like Danger, Danger.js, like GitHub, where it can look at code and it's not necessarily like a linter of basically saying like, hey, this is bad, but it can make a comment of being like, hey, you're doing something. Maybe this is related to this link over here and direct developers or whoever's reviewing to go take a look at the documentation.
Maybe there's no changes that require there, and we definitely need to be careful about how much noise we generate. But in the case of a migration, if a developer writes a migration and then submits it, we could basically say, hey, did you add a new file to the db migrate file?
If so, make sure you're following the steps in here and make sure that it aligns and kind of point them back at the documentation, both for the writer of the pull request, but the reader. And then kind of helps make sure that things stay in sync. Not a perfect process. I think we're slowly getting better at making sure the documentation stays up to date.
Yeah, that's always the painful part. And those are great insights.
What do you think of that DHH guy? He's a bit of a weirdo, isn't he? Don't answer that. Trust me. No, I love DHH. He did a book quite a few years ago called Rework, which was prophetic, really, in the current situation about working from home. He did a RailsConf keynote, I think it was, a couple of years ago. where he said that at Basecamp, they have never had a DBA.
So they've never employed a person whose job it was to administer the database. This is something which Rails has just magically scaled up and the databases are scaled up. Are you in the same situation? Have you never employed a DBA for your very large Rails databases?
Yeah, actually, we are in the same situation. I think we were going to hire a DBA this year prior to the pandemic, and then I think there were some complications. But prior to that, the company has been operating for over 11 years, and I think now no DBA.
We definitely have some DevOps that are a little bit focused on making sure that the database is running and making sure that we've got replication set up and proper statistics. But we kind of put the onus on everyone. Like you don't have one person who is the guru of SQL. You have everyone. And so everyone tries to teach everyone these things.
And we try to do our best to share that knowledge where we can to make everyone as experts as we can. So we've managed to go 11 years with no DBA, and I think we're only getting to wanting one now because we're trying to do really customized processes of this online schema migration stuff.
How do we make that completely automated, which is actually going to be a completely distinct system to the Rails system because we're going to want to apply it to any of our projects. Or maybe some gotchas between upgrading MySQL. There's probably some things that they might actually have really good insight into.
But I think our general approach is, even in that situation, we're going to have one DBA and hundreds of developers. And we want to make sure that they may have knowledge that might be useful for talking through things and sharing things, but the work is going to still fall on the developers.
And we need to make sure that everyone is learning as much as they can and not just blindly hoping that the DBA is going to handle it.
Yeah, I mean, the way DHH presented it, it was kind of this is a necessary evil mind was to have a database specialist. Instead, Rails enables developers to kind of handle this themselves and not just kind of blame the database man or woman. when the thing goes wrong. Surely as a company gets bigger, you have more specialised roles and not less specialised roles.
Yeah, I would agree. And I think there are more specialized roles, but I think there are skills that apply to everyone. So, you know, as the company grows, you may have more specialized roles that have more specific knowledge, but I think probably with that specific knowledge comes the responsibility that they are not gatekeepers of that knowledge, right?
They may be experts and they're maybe building content, but I would say part of their job is to make sure that that content is consumable by everyone. And if they're answering the same questions over and over and over, they're not doing their job to educate people on how to self-serve and do it themselves.
And that's how we learn and grow as a community and get better is just by sharing this knowledge.
It's a really quite interesting situation. I don't know what it means for the DBAs, but I think there's definitely more database work out there. But I think because Rails just makes it so easy to work with databases at scale, that you kind of tend to hit that stage much, much later on.
Yeah, I agree. You don't necessarily have to have everyone customly building SQL. Active Record does a pretty good job of being an ORM that lets developers just do the things that they need to do. And there's data notifications available to easily add tooling that you don't need the DBA for.
But as things grow, there's things that Rails doesn't yet have tooling, and maybe that's something that if you have a DBA who is well-versed in Rails, maybe they can contribute back to the framework or add their own gems that can help everybody get better at working with databases.
And it doesn't necessarily invalidate their job, but their job becomes more of a knowledge producer, and they try to share that knowledge and make the community better.
Yeah, we're in the same boat. We like to push that knowledge down as far as possible. But there certainly are opportunities when you're deep into materialized views and windowing and Postgres or something like that, where you're just like, I really want to phone a DBA friend. And that's the con side, I would suppose.
Yeah. And I think that's, that's like the roles of the specialists, the people who have the specialized knowledge, they're probably more consultants. And, you know, you have someone who's like, I've got a really gnarly problem. I don't know what to do. Yeah. Like let, get them to like sit down and help with you. And that's a big asset that they can help with people.
And, you know, if that's a one-off, it's a one-off, but if they do this 10 times in a week, maybe there's education there, or maybe there's tooling. And I think it goes to pretty much any role that you ever feel like you're just throwing something over the fence. If you push that responsibility also to the developers, you can also end up with a much higher quality project.
Teach them how to use includes and avoid some of those massive queries or N plus one problems.
Or use some of the gems available and have the N plus one queries just automatically avoided for you.
Yeah, I've had some include statements which spanned 50 lines on some projects I inherited. It's insane, the kind of data that they're trying to return. But yeah, it's crazy. Good advice. Was there anything else? that we want to talk about? I know we're getting at about that time.
I was just going to mention one thing about includes, because I think this is another gotcha of Rails, is they don't really teach you what happens with includes. And includes actually does two things in the background. It either uses a preload or an eager load.
And a preload splits it off into a different query entirely, where you do something like select star from table, where ID is in this big list. But then there's eager load, which tries to smoosh it into one big query. This is something where Rails always suggests using includes because it'll handle that distinction for you. But that distinction actually makes a difference at scale.
And when you're dealing with large tables, eager load is almost always worse.
significantly and so it's almost all the time you actually want to use preload same same interface but it's just this interesting little gotcha that you don't really realize until it starts biting you and you got to remember everything is just a tool and you can either smash your finger with that hammer or you can build what you want to build with it exactly right kyle well if people want to follow you and some of the stuff that you're doing online where should they go
I don't really have a huge online presence. I do have a GitHub account, but that's mostly working on either public gems for the company. But what I'm trying to do is be a little bit more present in the community. So I do have some talks available at RailsConf. And my goal is to be pushing out a little bit more written content, which is available at the blogs that Clio provides.
So I can provide a link for that in the future, as well as a link to any of the talks that I have. Unfortunately, I'm not a super user on Twitter, but I can also provide my LinkedIn where I sometimes post new information there as well.
Awesome. Well, I'm going to move us over to some picks. Luke, do you want to start us off?
Yeah. Listen to this. Listen to this. Can you hear that? I can't hear anything. That is the sound of me signing up for DriftingRuby.com, which is a quite excellent series of Railscasts, including the excellent From jQuery to ES6 episode. I am a notorious jQuery user, almost an unrepentant one. But Drifting Ruby has let me see the light, and I'm a newly reformed character.
So my pick is driftingruby.com.
I must say that's a great pick. All right. Hey, Matt, you want to chime in with some picks?
Well, my pick comes out of this. I'd say that Danger.js is something that I really want to look into. We're significantly investing in CI, CD infrastructure and deploying those branches like you were talking about, Dave. And so that looks like a really great way to tie back to documentation and check the best practices that conform with the rest of the company. And that's my pick for today.
I'll let you know what I discover.
Awesome. I'll jump in with a couple of picks. One is from Google. It is the Titan Security Key. Other companies have similar products like the UBO Key. It's a USB or a NFC key that will do your authentication for you. So I actually have a couple of these arriving in the mail today in preparation for another Drifter Ruby episode that I want to do on these things.
So that should be a pretty interesting one. I don't think it's going to have too much popularity because I never had one of these keys before or later today. And the other is I have now in front of me a little rack of Raspberry Pi 8GB of RAMs that I'm building into a tiny Kubernetes cluster for, well... Just because I can, really.
So I love Raspberry Pis, and they just released their 8GB versions, which actually makes it nicer to run some heftier things on it now. Still slow, but still a lot of fun. All right, Kyle, do you want to join in with some picks?
I didn't prepare anything, so I actually don't have anything that's off the top of my mind here for things to just call out.
All right. Fair enough. Well, it was great talking to you, Kyle. And I always like talking about technical debt because I am notorious for introducing it.
I'm always happy to like building tools to fix these things so that we can make the community better.
All right. Well, that's a wrap for this episode. We appreciate you coming and talking with us. It was a lot of fun.
Yeah, it was wonderful. Thank you.
Bye.
Take care.