John Gallagher
Appearances
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But as with everything else, I would say if you're really not feeling any pain, don't bother. Just don't bother. I'm not into kind of – I'm not really interested in telling people what they should be doing or could be doing. I mean, goodness me, we hear enough of that in engineering, don't we? You should really learn a language every year. You should be Blair. You should be Blair.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I'm sick of it, absolutely sick of all these gurus telling me what to do and what I should be learning and what I – And very few of them talk about, well, what's the benefit to me? And in order for me to do anything, in order for me to change as a human being in any way, learn anything, I have to feel the pain of it. If you're not feeling the pain, don't bother.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But if you are feeling the pain, if deploys are really glitchy, if you keep asking, for me, the kicker is if I keep asking questions I don't have the answer to, That's a concern. And if they're just minor, oh, like, why did I wake up 10 minutes late today? Who cares? It's not important.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But if the site's gone down for the fourth time this month, and every time the site goes down, we lose at least five grand, 10 grand, maybe even more. And even worse, every single time the site does go down, we just kind of get it back up more by luck than good judgment. This kind of feeling of, oh, we kind of got away with it that time. That's OK.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I know there was this weird thing and it's still not really figured that one out, but that's OK. We'll just put it in the backlog. Um, it's the operational risk. You've got to decide, are you comfortable with that operational risk or not? Is it big enough? And in my experience, you've kind of got to hit rock bottom with this stuff.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
As I did, there were loads and loads of bugs that I could have investigated and added logging for and fixed, but you know, it's pushing a boulder up a hill. It's not actually worth it. And it was only when it reached my threshold of pain. I was like, you know what? I have to do something about this now. This is just ridiculous. We're professional people.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We're being paid a lot of money and it's not working. The app that we've delivered is not working. What's more, we don't know why. But also I do just want to add, and this may broaden out the conversation a little bit. You may want to, we may want to keep it narrow on Rails apps, but I've realized that observability principles go way beyond how does our web app work? It applies to any black box.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So as an example, a few years ago, I was working at a company and their SEO wasn't great. And they just kind of were like, oh, you know, we'll try and fix it. And they had several attempts to fix it. None of them really worked. And every attempt was the same. They would get some expert in. The expert would give us a list of 100 things to do. We would do 80 of the 100.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And then nothing would really improve. And then they'd be like, well, we did everything you said. And then they'd move on to another. And rinse and repeat, keep doing that. And then one day, within four weeks, 20% of the site traffic disappeared. And nobody could tell us why. Nobody understood why. Observability. Now, Google is a black box.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So, you know, you're not going to be able to instrument Google. But there's lots of tools that allow you to peer into the inner workings of Google, SEMrush, Screaming Frog, all these kind of tools. They are, in my opinion, actually in, to some degree, the observability space. They're not... Everybody thinks of them as marketing tools, SERPs, engine optimization tools, whatever, whatever, whatever.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
They're allowing you to make reasoned guesses about why your searches aren't performing the way they are. And then you can actually take action on that because now you have some data. Oh, this keyword dropped from place four to place 100. Why is that?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
okay let's try a let's try hypothesis a put that live and see if google will respond to that oh and now up to you know position 80 whatever it is so the idea of observability goes way way beyond like data dog and new relic and obviously all of those people in the observability space but i i see it as a much much wider and much more applicable topic yeah i i hear you there uh
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Great question. I would say if you're making a small app with very little traffic and it's thresholds like anything else, you're making a small app with very little traffic. I have a client at the moment I'm consulting for. and I've made them an app, and it has maybe flipping 20 visits a day or something, 20 hits a day. So I installed Rollbar, free version of Rollbar.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Well, um, I don't actually think anybody should care about observability and I don't care about observability as a thing because it's just a means to an end. And what's the actual goal. Um, Doesn't matter how you get there, but the goal is being able to, number one, understand your Rails app in production, and number two, be able to ask unusual questions.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Anything goes wrong, I get a notification. It's fine. The further up the stack you move, the more the defaults change. For a Rails app that's mission critical that I'm not even going to say mission critical, but just serving a decent number of hits a month, uh, 10,000, 20,000. I don't know. I've tried a lot of observability tools. Um, and there's no one that yet that I can unreservedly recommend.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
They're all got their pros and cons. Um, Datadog is a good option if money is no object. I kind of don't want to get into the tooling debate because it's kind of a bit of a red herring, I think, in many ways. There's various cost-benefit trade-offs there. But in terms of the defaults, in terms of what you observe, requests has got to be up there.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So every app that I have in my care of any significant size, I would always say install semantic logger. Semantic logger is the best logger I've found. It does JSON out of the box. It's quite extensible. There are many problems with it, but it's the best option that we've got. So that's number one. That will log every, like Rails already logs every request for you.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
That will format in JSON for you. There are some notable missing defaults in semantic logger. And I'm working on a gem at the moment that will add some even more sensible defaults into it. So, for example, I believe that request headers do not get logged out of the box. Certainly request body does not get logged out of the box. Request headers might be.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
The user agent doesn't get logged out of the box. I mean, this is just crazy. Pretty basic stuff, right? So I have a setup that I use that logs a whole load of things about the requests out of the box. I like to add in user ID out of the box. It depends what kind of setup you have for authentication, but at the very, very least, if somebody's logged in,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
The ID of them should be logged in every single request. That is absolutely, you know, absolutely basic stuff. A request ID is also a really, really useful one. I have a complex relationship with logs and tracing because tracing is essentially the pinnacle of observability. I hear a lot of people say logging, like logging is a be all and end all.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Logging is a great place to start, but tracing is really where it's at. And I can go into that, why that is in a bit. But logging is a great default. Logging is a good place to start. Start with semantic logger. Basically, every single thing that's important in any request should be logged. So that's every header.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Obviously, you need to be careful with sensitive data in headers, like do your Rails active logs. I can't remember what it's called, but there's the filtering module that you can add in. And sometimes semantic logger doesn't give you that by default, so you need to be a bit careful. A good default as well is logging all background jobs. Background jobs are one of the most painful things
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
areas of observability that I've experienced, and we still haven't really cracked it. We have some very, very basic logging out the box in Semantic Logger. I believe it logs the job class, the job ID, and a few other things, but it doesn't log the latency, which is a Huge, huge missed opportunity. And it also, I don't believe it logs the request ID from whence it was enqueued.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Not questions that you've thought of a day, two days, three weeks ago, because that's not really very useful or interesting. If we knew exactly the questions to ask in the future of our apps, everything would be easy. Just be like, how many 200s have we had in the last week? It's kind of a boring question to ask. Maybe a bit useful. I find the more obvious the question, the less useful it is.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So when a job is enqueued, it will, by default, semantic logger will trigger a little entry in the logs, this job is enqueued, and it will tell you what request it came from. But on the other side, when it's picked up and the job is performed, that request ID is missing. So you need to kind of go into the request ID, find the enqueued job, find the job ID, and then take that next leap.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So, I mean, it's a bit clunky, but it's manageable. So in short, semantic logging gives you some okay defaults out of the box, but there's some really basics that it still misses. And so background jobs, requests, those are the two really, really big ones to start out with. But as you can imagine, there are a ton more.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Yeah. There's so many things you touched on there I want to come back to. To answer your question, first of all, it's the five steps that I walked through. And that's the short answer is if you have a specific question that you cannot answer, what we're really talking about is the implementation details of how you answer that question.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So what question you pick determines a whole load of, a whole load of stuff. I can't just give you a bog standard answer because it just, it depends. I hate saying that, but it does. So I think, yeah, The first question is to ask the question, figure out what data is missing, and then choose the right piece to add into your logs. I feel like I've maybe not understood your question maybe.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So observability is the practice of making a black box system more transparent. So I like to think of it, imagine your entire Rails app, all the hosting, everything to do with that app is wrapped up in an opaque black box. And somebody says, how does it work? And why is this thing going wrong? You would have no hope of understanding it. If the box is completely translucent,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Fantastic question. So, and this gets to the root of why the three pillars are complete nonsense.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Metrics, traces, and logs.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Nonsense. They're not three pillars. The analogy I like to use is saying that observability is three pillars and it's traces, logs, and metrics is a bit like saying programming is three pillars. It's arrays, integers, and strings. It's the same kind of deal. No, it's nothing to do with those things. Well, it is because you use those every day. Yes, but you're kind of missing the point.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So thanks to some amazing work by people at Honeycomb and charity majors and reading their stuff and reading their incredible work, I've realized that metrics, tracing, and logs are missing the point. The point is we want to see events that happened at some point in time. And that neatly answers your question about how do you reconstruct the state of the app.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I mean, the short answer is, of course, you can't. If you're not in an event-driven system, if you're in a CRUD app, if you're storing state to the database, there is no way you can go back in time and accurately recreate it. But we can give it a reasonably good stab.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And we can do this by capturing the state of each event when it was, forget about observability tools and logging and structured logging and tracing just for now. Imagine if when that incident happened, let's say my expired token would be maybe potentially a good example. There are several points in that timeline that we want to understand. Number one, when the token was created.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Sure. Thanks for having me on. My name is John Gallagher, and I am a senior engineer at I had a company called BiggerPockets, and we teach how to invest in real estate based in the US. And I also run my own business on the side called Joyful Programming to introduce more joy to the world of programming. And I'm on today to talk a bit about observability, which is one of my many passions.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Number two, when the user hit the website. And maybe there's a third one, when the account was created, let's say that. So imagine if at each of those three points, we had a rich event with everything related to that event in it. So when the account was created,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
we had the account ID, the status of the account, whether it's pending or not, the creation date, the customer, the customer ID, blah, blah, blah, blah, blah. And then when the user visited the site, what was the request? What was the request ID? What was the user ID? What was the anonymous user ID? Et cetera, et cetera. And then when the token was created, what was the expiry? What was the this?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
What was the that? What was the user ID? Okay. So if we have those three events, and we have enough rich data gathered with each of the events, we can answer your question. Does that make sense so far? There's a whole load of more blah, blah, blah, but does that make sense so far?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And also other events that are happening in the system. So there's user did something, computer did something, computer enqueued a background job, performed a job, et cetera, et cetera. So the way I think about it is everything that happens in your app, whether it's initialized by the computer, an external data source, a user, it's basic event storming stuff really. That creates an event.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And that event, if you don't capture enough data, that is it. The data is lost forever if you're not in an event. Assuming you're not doing event sourcing and assuming you're not in an event-driven system. So the way I think about it at the most core fundamental level is whether it's logs, traces, metrics, whatever it is, we need a way of capturing those events.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And more importantly, ideally, we need to link the events together. And this is really, really, really important. So if somebody create, let's say somebody hits our app and it creates the token. Well, there's two parts to that. They hit the app. There was a request to our app. And then in the call stack somewhere, the token is created. Those two things are two separate events, but they're nested.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We want to capture that causal relationship. One calls the other. One is a subset of the other. One is a parent, a child, however you want to put it. Without that causal link, We're lost again. We don't know what's caused what. So there are some three or four ideas here. Number one, events. Number two, contextual data with each of those events.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And you can see everything, which of course is completely impossible in software. But in theory, you'd have this completely translucent box and you can ask all these questions and you get instant answers. That's like 100% observability. And of course, that is absolutely impossible.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And number three, nested events, if you like, causal relationships between events. And with those three things, you can debug any problem that you would like, is my claim. And so if you just keep that model in mind, let's examine traces, logs, and metrics and see where they fall short, see which one meets those criteria. So tracing gives us all three.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So for those of you, I should explain what tracing is because I was confused about what tracing even was for absolutely years. So tracing allows you to, when somebody hits your app, a trace is started. So there are two concepts in tracing. There's traces and there are spans. And then there's the data associated with spans. But let's just leave that to one side.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So when somebody hits your app with a request, a trace is started. And so the trace will be like, okay, I've started. Here I am. You can append any data that you want to me whilst I'm open. It's like opening the cupboard door, and then you keep putting stuff in the cupboard, and then once the cupboard door's closed, you can't put any more stuff in it. Very simple analogy.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So we open the door, we start the trace, and so it goes down to the controller level. And the controller says, oh, I'm going to glom on some data into whatever the existing trace is about the method, the post body, the request, blah, blah, blah, blah, blah, headers, whatever it is. I'm going to glom that on to the current trace. And then we get down into maybe you've got a service object.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I know some people hate them. I love them. Blah, blah, blah, whatever. That's not the podcast about job. So you get into a service object and the service object says, oh, whatever is in the current trace, I want you to know you hit me and you hit me with these arguments. Cool. I'm going to append that to the trace as well. And then we enqueue a background job. That event gets added onto the trace.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And then even more excitingly, there's a setting in OpenTelemetry where when the job is picked up and performed, the trace is kept open. And there's a whole load of debate about whether this is a good idea or not. But you can do it. You can keep the trace open until that job is started. And so the job says, ah, I've kicked off now. It gloms a whole load more stuff.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Maybe you make an API request in the job. It gloms a whole load more stuff into the trace. And then it comes all the way back up the stack. And you have this trace with all this nested context. And when it's saying, I'm going to glom this data onto the trace, that's called a span. And a span is nested. So you can have spans nested inside spans inside spans.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And so what we're trying to do with observability is understand what is going on, not just when it goes wrong, although that's the obvious use case, is we have an incident, the most critical point where observability comes into play is an exact scenario that I landed in two weeks into a new role I had. So it was two weeks in, the site had gone down,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So essentially, it's this big tree structure. And you might have seen this before. It's the flame graph that you get in Datadog and New Relic and all these kind of things. And everybody looks at these things and thinks they're really pretty. And they are. Indeed, they are. So that's the pinnacle of observability in my head. Traces give it us all. And we can say,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
as you can do in any of these observability tools that support tracing, you can do some really cool stuff. Show me all the requests that were a 200 that enqueued a job where the job lasted for more than three seconds. Holy cow, now we're cooking with gas. We've got everything that we need. Show me all the spans that indicated anything to do with the background job.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
where it was a 500 response, but the user was logged in, and, and, and, and. And so we can start to not only query the spans, but query the parents of the spans. So you've got all of these nested causal relationships, and it gets ridiculously powerful. So that's traces. Cool. Let's look at logs. What do logs give us? Well, it gives us events. That's all logs are, really.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It's a series of events that happen. Does it give us the ability to nest events inside one another? Nope. Sorry. Your luck's out. You can log causation IDs and you can link them together. And obviously you can log request IDs and filter everything by the request ID. But there's no concept in the log of this log is nested inside this other log. So that information, goodbye. It's gone.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Don't have it. But you have the rich data in every event. Let's look at metrics. What does metrics give you? It doesn't give you the events. It doesn't give you the nesting. And it just gives you some aggregated numbers. So I don't think of them as three pillars. They're three rungs of a ladder. The very top rung is tracing. Awesome. The next rung down is logs. Pretty good.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And metrics are useless. Now, when I say metrics are useless, people get upset with me and say, oh, well, I look at metrics all the time to understand my app. Yeah. Okay. But if you derive metrics from higher rungs, that's totally cool. Totally fine. But what's a really bad idea is to...
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
directly say i'm going to send this metric right now to my backend and people do this all the time people think this is a good idea it's okay i mean it's better than nothing right it's it's just depends on the fidelity of information you want but the problem is there's two problems actually but the main one is you've sent that data okay you sent it to prometheus datadog whatever you sent that one data point
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So then you look in the metrics and you say, holy cow, we're getting all these 500s. Why is that? I'll sit here and wait as long as you want. You're not going to be able to tell me the answer to the question unless it's blindingly obvious, unless you can say, oh, well, this other bit of data over here is like correlates with it time-wise and maybe it might be that. Yeah, okay, it might be that.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
How do you know it's that? Well, we're having to guess. Guessing is not a strategy. Hope is not a strategy. I don't really want to debug by just flipping. Guessing, I want to know. And the only way of knowing is having traces. So the way I like to think of it is tracing is the pinnacle. Logs can be derived from traces, which is why the three rungs of ladder.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And everything can be derived as a metric from the two rungs above. So if you've got only logs, you don't have any nested context. But you can get metrics from logs. Fine. If you just have metrics, I would say you're not in great shape because you can't understand why without pure guessing. And it amazes me how many people push back on this idea and think just having some metrics is enough.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It's nowhere near enough. Not in my experience. If somebody wants to refute me and come on this podcast or have a chat with me after, I would love to listen to how metrics allow you to debug very, very deliberately and get the exact data that you need.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
You can send off dimensions to metrics and then your metrics bill explodes within about five seconds, especially if it's high cardinality data like IP addresses. I've made that mistake before. We're going to send a dimension of IP with our metrics so that we can understand what's going on. In a week, my manager usually messages me, usually in less than a week, saying, can you turn that off?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I am in the UK, and the rest of my team were in the US, and there were two other engineers in my time zone. And all of us had been at the company for a total of five weeks. So we've got this app. It's down. It's on fire. And we need to put the fire out. And the three of us looked at each other. We were like, should we just restart the dynos? Yeah. So we restarted the dynos.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We just got a day's dog bill of like five grand. Whoopsies.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So a few things there. Number one, you bring up a really good exception I'd forgotten to mention conveniently. If it's infrastructure stuff, if it's like memory, hard disk space, all that kind of stuff, fair game for metrics. Fine. The second thing is I'm quite hyperbolic. So I'm quite an extreme person. So when I say they're useless, I don't mean literally they're completely useless.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I think of metrics as a hint. Hey, there's something going on over here. Cool, that's not useless. Obviously, it's useful. But then the next question is why? And if you've got a super simple system, then it's probably like three things. And you go, well, there's only three jobs in the system. So cool. And maybe you've segregated your metrics by background jobs, which is fair.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
You know, it gives you a place to look. It gives you a starting point. But I've, yeah, yeah. They're useful in the aggregate and they're useful at giving you a hint. And yes, they're useful in terms of like making sure the infrastructure is still running. But I see a lot of people depending on them. And I, you know, there's a guy I really respect, used to work with him called Lewis Jones.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And him and I have gone back and forth on this over LinkedIn. And he is convinced I'm wrong about this. He's like, we run everything through metrics. Metrics are awesome. You're just on cloud nine if you think you can trace everything. And there's also a significant weakness with tracing as well, which is you can't trace everything unless you've got relatively low throughput.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
or even medium throughput, you can make it work. If you trace every single request and you're doing millions of requests a day, I dread to think what your bill is going to be. So, and then that's where head tracing and head sampling and tail sampling comes into it. And we can get into that if you would like.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We crossed our fingers. And it was pure luck that the app came back up. That is the exact opposite of what we want. And we've now moved to a situation where we can ask our app a whole load of very unusual questions. And we will get an answer to that. Why are there a peak of 404s on iOS at 3 a.m.? Looks like a lot of them are coming from this IP address.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Totally. So I should say I am singing the praises of tracing. but it's a slightly utopian vision that I'm painting because 90% of the work I've done is with logging purely because it's simple to get going. It's more of a known quantity.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And a lot of my talks, this is why I'm not talking a lot about tracing and I'm talking about structured logging because I think structured logging gives you this kind of event-based mindset that you can then start extending to tracing
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
and the reverse is not true like you can't take that event based kind of mindset into metrics because metrics is just that aggregation right so um but i have like recently i've been doing a lot of queries in our rails app and i've been going to we use new relic sorry we use datadog at work and i've been going to datadog's tracing um interface
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
and really trying to answer my questions there instead of in logging. So we have both tracing and logging. Our tracing is hobbled a little bit, just purely because of cost reasons. And our logging is not so hobbled. So are the standards heading in the right direction? Yes, but it's going to take a really long time to get there is my short answer.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
There is a lot of different ways of going about tracing. The most promising, as we all know, is open telemetry. But, I mean, I read some pretty harsh critiques of open telemetry. There's kind of a topic that generally divides people. If you don't know anything about open telemetry, it sounds an absolute utopia. And I got really excited when I started researching into it.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
The more you dig into it, the more you realize... how much complexity there is to resolve and how many challenges that project faces in order to resolve them. And so, I mean, what it's trying to resolve is 30, maybe 40 years, possibly even more, of legacy software, right? Because that's how long logging has been around.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And they're trying to aggregate all of that into one single standard good look. It's a very, very difficult problem to solve. And they're doing an incredible job. But it's very, very difficult. So they have open telemetry is where I'd start with the answer to your question. Open telemetry is 100% the future. I've not seen anything that rivals it.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And open tracing, I believe, came first and then evolved into open telemetry, from my understanding. Apologies if I've got that slightly wrong. And so, yeah, I think there's a few options if you're in Ruby. None of which are ideal. So the OpenTelemetry client in Ruby is not ready for primetime. It's quite behind the current standards in OpenTelemetry.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It doesn't obey any of the latest semantic standards, for example. I have played around with it in an example project. And when it's working, it's absolutely incredible. It's next level brilliant. There are a few problems with it. It's extremely slow. So I tried to use tracing on our test suite at work using this open telemetry tracing.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Okay, what's that IP address doing on the site? Okay, interesting. How many users are using that IP address? Five. So only five people are using it. So that's the point of observability to me, to be able to ask unusual questions that you haven't thought of already,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And it just, it's like, I can't remember the numbers, but it really slowed down our test suite to the point where it really just wasn't practical to use because we were trying to measure the performance of the test suite. So, you know,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
um i could have been doing something stupid there it's very possible that i just wasn't using it the right way so sorry open slam machine folks if i've i got i know um i think a lady is called kaylee who is from new relic and she and um I'm so sorry, the names escape me. But there's a whole bunch of people in the Ruby space who are working really hard on OpenTelemetry.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But it's just that the OpenTelemetry project is moving so fast, that's the other problem. So that's option number one, OpenTelemetry. You could maybe fork it and tweak it yourself. The second option and what we use at work is, because we're using Datadog, we use Datadog's tracing tool, which is pretty good. But then even with tracing or logging, I feel like we're kind of,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
maybe 20 years behind where everybody else is in programming in terms of observability. Because one of the questions I often have when I look at this stuff and even think about tracing, I maybe have like five, six, seven questions that even I can't resolve, which is what do I trace? How much detail do I trace in? How much is this going to cost me?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And we're still in the stone age with a lot of this stuff. So I don't have any good answers for you in that regard. So we use... the vendor tooling for tracing. I'm sure Eurelic has its own version of that. In fact, I know they do. I know Sentry does. There are certain other providers that don't have any tracing capabilities at all.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So I would say for now, the best option we have is relying on the vendor tracing tools, I would say.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
definitely that's leading me up to my, one of my big kind of rants, passions, whatever within the observability space. And I don't see anybody talking about this. Um, I feel like it's either I'm onto a really great idea or it's an unbelievably idiotic idea for some reason that I don't know. It's usually the latter as a spoiler. Um, Okay.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So when I'm looking at traces, there's almost never enough information. Almost never enough information. And this is why charity majors and the team at Honeycomb and Liz Fong-Jones always talk about have wide context-aware events. That's their mantra. Wide context-aware events. And Events, we've already talked about. Context, we've already talked about. We haven't talked much about the wide.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So wide means lots of attributes. So their take on it is add as many attributes as you can to every event. And make them high cardinality attributes. What does that mean? It took me about three months to wrap my head around what high cardinality means. It means anything ending in an ID. There you go. That's an easy explanation. So a request ID. Oops. Sorry, that was me and my microphone.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Anything that looks GUID-like. Anything that is a unique identifier for anything, so that's user ID, request ID, but anything that is a domain object, and this is the real missed opportunity, I think, that we have in the Rails community and in observability community potentially in general. When something goes wrong, or even when something goes right, let's take the token as an example.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
When that token is created, the token is a domain object. Now, okay, it's to do with authentication. So it's not really a domain object in a way. But let's say that customer is signing up for an account. The account definitely is a domain object.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And if you want to understand what I mean by domain object, I just mean an object that belongs to the domain, the business domain in which you're operating. It's a business object, a domain object, call it what you will. But when the CTO or even better, the CEO or somebody in marketing talks about this customer account, they talk about people creating accounts. They use that word account.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
That's your first clue that it's a really important concept in the domain. So that's what I say when I mean domain objects. I mean words that non-technical people use to describe your app. So they're domain objects. Why are we not adding every relevant domain object to every event? We don't do it.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And so what you'll see is people do this kind of half-hearted, oh, well, we'll add the ID to the current span or the current trace or even the current log. We'll add the ID. And that's okay. That'll be enough. But you're not capturing the state of the object. Why not just take the object, in this case the account, convert it into a hash, and attach it to the event? Why can't we do that?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Now there's a number of reasons why we actually can't do that in some cases. If you're billed in terms of the size of your event, so if you're billed on data, obviously that's going to get expensive fast.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But if you're billed on pure events, as in your observability provider, your observability tooling, is saying for every X number of events or X number of logs per month, we will charge you this much, but the size doesn't matter. then this is a perfect use case to be taking those rich domain objects, converting them into a structured format, and dumping them in the log or the trace.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And so I've kind of thought about this quite a lot, and I've come up with a few quite simple ideas that people can use starting tomorrow in their Rails apps. Not without their problems, but The first of which is, I don't know if anybody's worked with formatted, so two formatted S for date time strings. And we have this idea in Ruby, don't we, of duck typing.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We have an object and really good OO designers that you shouldn't understand anything about that object. You just know it's got four methods on it. And it can be an account. It can be an invoice. It can be many different things. So my approach, and I'm testing this approach out at work at the moment, is instead of having two formatted S, have two formatted H. What does that mean?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It means you're going to format the domain object as a hash. And so to formatted S allows you to pass in a symbol to define the kind of format that you want. So it can be short, ordinal, long, humanized, and it will output a string. It will output a stringified version of that date in these different formats.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So my idea is, why can't we have a method on every single domain object in our Rails app called toFormattedH, and you pass it in a format. That format could be then OpenTelemetry. It could be any one of the numbers, a short, compact. And so for every trace, the way I like to think of it is, I want to, into that trace, add every object that's related to that.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And you could format those in OpenTelemetry format, for example, or you could have a full format or a long format, whatever you want. And so that way you can say, oh, I just want to, I want a representation of the account that is short and it's just got the ID. And that's a totally minimal skeleton. And that's enough for me. But actually here, the work I'm doing is a bit more involved.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So I want to call to formatted H with full. And that will give the full account, like the updated app, created app, everything about it. And then that will be sent to my logs and traces. And I now have a standardized way of observing what's going on with all the rich data of my app state at that point with all the relevant domain objects in it.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So that's my dream that I'm headed towards with this gem. So that's kind of the way I think about structuring it. And I think about the, like, people, I see people doing all this ad hoc kind of, well, this is an ID, and then we'll call the job ID, job underscore ID, I suppose. And what's the account? We can call that account underscore ID. And I just like to think of it as,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Imagine your domain object. So an account has a customer. A customer has some bank details. Bank details is a bad idea, but address maybe. And so we could have these different formats that load nested relationships or not. And obviously, you've got to be careful about the performance problems with that. And so you'll have the exact structure of your domain object in your logs, in your traces.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
That, for me, is a dream. And then every single time an account is
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
is logged it's in the same structure awesome so i know that an account is always going to have an id it's always going to have a whatever other attributes you count a pending status whatever it is and so therefore i can say show me every trace where the account was pending boom yeah i love that idea and uh it does it reminds me a little of the introduction of the uh
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And then, I mean, the thing that I think is unbelievably ironic is all I'm talking about is convention over configuration. And is that not why we all got into Rails? I know Ruby is a different thing, but Rails is all about convention over configuration. and the entire area of observability, it strikes me, could do with a massive dollop of convention over configuration.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And that's what OpenTelemetry are trying to do. The one last thing, and I know that time is getting on, but one last thing I want to just say on that is the other huge opportunity is adding context to errors. So we have these exception objects in Ruby, and people store strings with them, and it's like,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
what how do you suppose how am i supposed to understand anything from a string and then people try and put ids in the strings and you're like no stop so at work i've made this extremely simple um basically a subclass of standard error where you can attach context so when you create the error you pass in structured context so if our logs are structured surely our errors should be structured as well makes sense right so
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
You should definitely not look at doing everything all at once. As I think we can all agree in software, doing everything all at once is a recipe for disaster, no matter what you're doing.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
You can say, this error happened, and here was the account associated with it when that error happened. And here's a user, and here's this. So it gets attached into the error. And then using Rails' new error handling, rails.error.handle, if you've not used it before, look it up. It's absolutely awesome.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It's one of my favorite things that they've added to Rails recently, relatively recently in the last few years. And you can... basically have listeners to these events, to these errors, beg your pardon. It will catch the errors and then the context is encapsulated in the error. So you can pass these errors around and then you can do interesting stuff with that context.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And all I do is pull out all the context and send it straight into the logs. And that has absolutely changed the way I debug. Because whenever there's an error and it has all this rich data, you just look in the rich data and you're like, oh, that was the account. That was the Shopify ID. That was a product ID. I've got it.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And then you just look at the ID and your external, oh, right, okay, it's out of sync, whatever it is. It makes life so much easier. So that's something I'm really passionate about as well, having domain objects encapsulated within errors. So we've got structured errors, not just structured logs.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I think the main thing is if you're listening to this and anything that I'm saying is resonating, forget about the domain object stuff. That's like getting really into the nitty gritty. But coming back to the beginning, if you're frustrated by your debugging experience, if you're thinking, why am I not smart enough to understand this? Chances are the problem is not with you. It's with the tools.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
There are vendors that tell you that you can do that. Whether you actually can or not is a different matter. Spoiler alert, you can't. So I just want to back up a little bit and talk about the feelings because I think it's the feelings that is where all of this start for me.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So if you improve the tools, not only do you make your life easier and better, but You level up everybody around you because all the engineers can use the same tools. And that's what we've experienced at BiggerPockets. And that culture of observability has really worked its way into our culture so that now anybody is equipped to go into the logs and ask any question that they want.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So it is a long road, but it all starts with a single step. And so if you are feeling that pain, feel free to reach out to me. I can go through all my socials in a minute, but feel free to reach out to me. Ask me any questions. I'm happy to jump on a Zoom call for half an hour and help you for free. But basically, it all starts by taking very small steps towards a very specific question.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Don't try and add observability because you'll still be here next Christmas. So take heed. There is hope. And if anything that I say resonates, please feel free to reach out to me and I'll help you figure it out.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Um, am I limited to one pick? Because I have many. No, go ahead. Cool. So, um, the first one is, uh, a new language and I already, um, I really thoroughly trounced the idea that we should be learning one programming language a year. Or rather, I just dissed it off without actually giving much justification.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So I'm going to go back on what I just said and say that this language has changed the way I think pretty much forever. And it's changed the way I see Ruby and Rails and just programming in general. And the language is called Unison. Now, it's a very, very strange, unusual language. It's maybe not that readable in places. And it's also extremely new.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I mean, it's been going for five or six years, but what they're trying to do is incredibly ambitious. But look it up. Yeah, it's an incredibly interesting language, and it will expand your mind. And that's certainly what it's done for me. And so it's kind of a language that's targeted at creating programs that are just much, much simpler, but actually more difficult to get your head around.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So I got into observability and it's funny because for the first kind of year of my journey doing this, I didn't even realize I was doing observability. I'd heard about this observability thing and it was out there in the universe. Okay. Maybe I should learn that. I should learn that. And I kept using the should. I should learn this. I should have loads of other stuff to do.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It's a completely new paradigm for distributed computing, basically. And it's absolutely fascinating. So I would highly suggest checking that out. I know that Dave Thomas at Yuruko, when I spoke at Yuruko recently, he was on stage and he was championing Unison and he called it the future of programming. And I could not agree more. It's an incredible language made by some incredibly smart people.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So that's number one. Number two, there is a static site builder. I've used pretty much all the static site builders on planet Earth. And this is my favorite. It's called Eleventy. It's a really odd name. But I am embarking upon this project at work.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
that really is exciting me, which is how do you serve UI components from a dynamic app, so Rails, and meld them into a static site builder without having a pile of JavaScript that you have to wade through? So I want to author my UI components in Rails, and I want to deliver them extremely fast through a static site that's just a blog without having to run that blog on Rails. So Eleventy,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
is my go-to tool for doing all that stuff. It also encompasses this thing called Web C, which is my new favorite templating language. Yes, I know, another templating language. I promise, I promise it's really good. It's not another retread of all these other templating languages that are very, very niche and very whatever. So Web C is compatible with Web Components,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And it's a fantastic way of making HTML-like components that are server-side rendered. And I would love to see a plugin for that come to Rails because it is absolutely phenomenal. So those are my two favorite things at the moment.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
If anybody's trying to wrestle with UI components in Rails and trying to extract them out of Rails components, also would love to chat through that with anybody who's interested in that kind of stuff. because I think it's, yeah, there's a potential to really break new ground. How about you?
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I've got loads of other things. I don't know what it is. I know it comes from controls here, and there's a Wikipedia page that's really complex and really confusing. Whatever. I've got real work to do. But what I know is that I kept coming across these bugs in Bugsnag, Sentry, Airbrake. Choose your error reporting tool. They all help you to a degree, but they're not a silver bullet.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Thank you. Yes. So I'm on LinkedIn. That's a platform I'm most active on. And my LinkedIn handle is Synaptic Mishap, which is. Yeah, I really regret that. Sorry, everybody. But yeah, so if you just search for John Gallagher, G-A-L-L-A-G-H-E-R, and maybe Rails or Observability, you should be able to find me. I've got quite a cheesy photo, a black and white photo of me in a suit.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
It's a horrible photo. And I blog at joyfulprogramming.com. It's a sub stack. So is this still a blog anymore? I have no idea, but that's where I write. I'm on Twitter at Synaptic Miss App and my GitHub handle is John Gallagher, all one word. So, yeah, Joyful Programming is the main source of goodies for me. I've also got a fairly minimal YouTube channel called Joyful Programming.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So feel free to reach out to me, connection request me, ask me any question. I would love to engage with some Ruby folks about observability. Tell me your problems and I'll try and help you wherever I can.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Thanks for having me, Valentino. It's been amazing.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And I kept coming across these defects over and over, and the story was exactly the same. Come across a defect, I'd see the stack trace in the error reporting tool, and I would look at it, and first emotion right out the gate, complete confusion. What is going on here? No idea. So I dig a little bit into the code. I dig a little bit into the stack trace. So it's coming from here.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I'm a bit of a polymath. This is one of the things that is really, really important to me and I'm passionate about. I'm particularly passionate about introducing this into Rails apps. So thanks for having me on.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And this thing is nil. Classic, right? This thing is nil. Where was it being passed in as nil? I don't know. So now I'm like, well, I can't just say I can't fix this. So I now have to, well, do what exactly? I don't have any information to go off. Well, I guess we'll do that bug later. Let's look at the next one. And this just kept happening.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And I would find myself going through all the bugs in the backlog and I couldn't fix any of them. And I just wasted four hours looking at things, asking questions that I couldn't explain, looking at things I didn't understand. And for years, I thought the problem was with me. I honestly thought I'm just not smart enough. I'm not a good engineer, blah, blah, blah, blah, blah.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Bug fixing just isn't really my thing. I'm just not really good at it. And then after many, many years of this, I was in a company, and I just got really sick of this. We just released a brand-new app, and it was a customer account app. And we were getting all these weird bug reports, people saying I can't log in, people saying I can't reset my password.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And every time we did this, we would add a little bit of kind of this ad hoc logging and then put the bug back in the backlog. And then it would come up again and come up again. And after a while, I was just like, this is just, this is ridiculous. We're highly paid engineers. This is not a better way.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So then I started looking into, we were using Kibana at the time, or rather I should say we were not using Kibana at the time. Kibana was there, we were paying for it. And I was like, I've heard this is something to do with logging. So where do we do our logging? People like Kibana. I have no idea what this even is. Let's open it up. And there was just all of this trash. all of this rubbish.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I was like, what's this? How's this supposed to be useful? People are like, oh, we don't really look at that. It's not very useful. I said, so how do you figure out bugs? And they're like, well, we just, we just figure it out. Well, yes, but we're not figuring it out. So all of this was born through frustration.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And so what I did back then is what I recommend everybody does now to answer your question. Come back to the point, John. Yeah. which is take a question that you wish you knew the answer to, a very specific question, not why is our app not performing as we want? Not as in like, why do our, you know, a very, very specific question. So take your big, big question. And the time this was,
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Why are people being locked out of the app? Why can they not reset their password? They're clicking on this password link and they're saying it's expired or it goes nowhere or it doesn't work. Okay. Why are those people, like, why is that happening? So that's quite a general question, and you want to break it down into some hypotheses. So that's the first thing.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
I have a five-step process, and this is step one. I'll go through the five-step process in a minute. So step one is think of a specific question. So a specific question in this case might be, Okay, I've got one customer here. There's many, many different types of defects. So this one customer here is saying it was expired. I went to the webpage and the link said it had expired.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Okay, when did they click on that link? What response did the app give to them? And when did the token timeout? So those are three questions. Now they're not going to get us to the answer directly, but there are three questions, very specific questions that we can add instrumentation for. So I would take one of those questions. When did the token timeout? Great question.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So in order to do that, we need to know when the token was created and what the expiry of the token was. This is just a random example off the top of my head. So you'd be like, okay, well, we need to know the customer ID. We need to know the token. We don't actually need to know the exact token, but we need to know the customer ID.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
We need to know the time that the token was created and the expiry time of that token. Is it 15 minutes? Is it two hours? Whatever. So I would then look into the code. So we've done step two. Step two is define the data that you want to collect. User ID, token expiry, and an event saying the token has been created now for this user ID. Okay, so that's the second step.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
The third step is build the instrumentation to do that. So whatever you have to do, maybe it's you need to actually add structured logging to your entire app. I don't know. Maybe it's that you've got the structured logging fine, but there's nothing listening to it. Maybe. Maybe the tool just can't actually measure what you want it to measure.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
So maybe you need to invest in a new tool, whatever it is. And then you build some code to instrument just that very small piece of functionality. And then once you've done that, you wait for it to deploy. And then you look at the graphs, you look at the logs, you look at the charts, whatever output you've got.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And what normally happens is, for me, I look at the charts and I say, that is not what I wanted at all, actually. I've misunderstood the problem. I've misunderstood the data I want. Now that I see it, ah! Just like you would with agility, true agility, not agile, because agile means something else now.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
But true agility is you do a little bit of work, you develop a feature, you show the customer, they say, not quite right. Go back, adjust it. Closer, but still not quite right. But if you ask them to describe it exactly right from the beginning, it doesn't align with what they want at all. You need to show them, and it's only by showing them that you get feedback.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And the same is true for ourselves. It's only by looking at the graphs and the logs that I realize that actually isn't what I wanted to begin with, or it is, or I'm onto something there. And so I keep then sort of I've used the graph. Maybe it was unusable. Maybe I couldn't query the parameter. Maybe there's all sorts of things that might be happening there. So then the last stage is improved.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
And so from improve, you can go back to the very beginning, ask a different question, or maybe you just want to iterate on the instrumentation a bit, deploy it again. Oh, that's more like it. Okay. So now we know the token expiry. What's the next question we want to ask? Well, when did the user actually hit the site? Was it after the token expiry or before? Hmm. Okay.
Ruby Rogues
Practical Observability: Logging, Tracing, and Metrics for Better Debugging - RUBY 656
Sounds like an obvious question, but maybe it's after, which would indicate the token really had expired. Oh, it's before. Huh? How could it be expired when it was before? Oh, hang on. What's the time zone of the token? Now we're getting into it, right? So you log the time zone. Holy cow, the time zone of the token is out of sync with the time zone of the user. That's what it is.