Paul Zaich
👤 PersonAppearances Over Time
Podcast Appearances
So one thing I've been experimenting with is trying to create more automated reports that go into sort of a Slack channel that we can look at. And so people can review that. And we've also implemented basically a bi-weekly review during our retro where we just look at our metrics and some of the longer running trends so that we can see if those look correct. Is there anything that's wrong?
So one thing I've been experimenting with is trying to create more automated reports that go into sort of a Slack channel that we can look at. And so people can review that. And we've also implemented basically a bi-weekly review during our retro where we just look at our metrics and some of the longer running trends so that we can see if those look correct. Is there anything that's wrong?
We can talk about it, see if there's things that we want to actually action on based on that review. So we're trying to find some ways to do check-ins that don't require us to be all in the office together.
We can talk about it, see if there's things that we want to actually action on based on that review. So we're trying to find some ways to do check-ins that don't require us to be all in the office together.
We have implemented what I consider custom metrics. We use Datadog. So a lot of this is out of the box. You can use their implementation, but you're adding some code to specific parts of your application. Maybe it's a callback on your active record model. When something is created, you send a message to an queue and then that triggers over a message into statsd.com. that goes to Datadog.
We have implemented what I consider custom metrics. We use Datadog. So a lot of this is out of the box. You can use their implementation, but you're adding some code to specific parts of your application. Maybe it's a callback on your active record model. When something is created, you send a message to an queue and then that triggers over a message into statsd.com. that goes to Datadog.
Anyways, it's a pretty lightweight implementation in terms of what you can do, but you're adding specific events that you want to track. And then you can create your own monitors and alerting around those or correlations between different events in your system.
Anyways, it's a pretty lightweight implementation in terms of what you can do, but you're adding specific events that you want to track. And then you can create your own monitors and alerting around those or correlations between different events in your system.
So you could potentially look at a custom metric and then look at that compared to HTTP statuses that are coming through or the latency of an endpoint. And then you could correlate those two metrics as well. So there's some more advanced things you can do there as well if you need to. But again, it's not really a lot of custom work.
So you could potentially look at a custom metric and then look at that compared to HTTP statuses that are coming through or the latency of an endpoint. And then you could correlate those two metrics as well. So there's some more advanced things you can do there as well if you need to. But again, it's not really a lot of custom work.
It's just adding some specific points in your code base that you feel like are really important to track. And one example of this for Rails users is, I believe there's something like this already set up for Datadog for Sidekick. So we instrument it on a lot of our
It's just adding some specific points in your code base that you feel like are really important to track. And one example of this for Rails users is, I believe there's something like this already set up for Datadog for Sidekick. So we instrument it on a lot of our
sidekick jobs and we can see when the lag is growing on on one of those cues we can see what the the average completion time is and look at the p90 completion time for different types of jobs So you get a lot of visibility into your sidekick workers and processes very easily, basically for free.
sidekick jobs and we can see when the lag is growing on on one of those cues we can see what the the average completion time is and look at the p90 completion time for different types of jobs So you get a lot of visibility into your sidekick workers and processes very easily, basically for free.
Just to be clear, we capture all of our errors in Sentry. We do have some alerting that goes to Slack, but I would also want to emphasize that anything that truly has any chance of being a serious issue should never be either an email or a Slack alert.
Just to be clear, we capture all of our errors in Sentry. We do have some alerting that goes to Slack, but I would also want to emphasize that anything that truly has any chance of being a serious issue should never be either an email or a Slack alert.
You really should have some kind of escalation via either maybe it's text, maybe it's an actual incident response system like PagerDuty where you can have an escalation policy. For us, that's what we're using. It should have this synchronous alerting that really forces someone to look at it. You can't rely on something asynchronous like Slack in this case for serious response on issues.
You really should have some kind of escalation via either maybe it's text, maybe it's an actual incident response system like PagerDuty where you can have an escalation policy. For us, that's what we're using. It should have this synchronous alerting that really forces someone to look at it. You can't rely on something asynchronous like Slack in this case for serious response on issues.
You can actually do that, I believe, at least with iOS. You can set up an override where you snooze everything else and then you can set up and you have to just put it in your personal contacts, whatever numbers you think you're going to receive critical notification from. And then that'll actually ring through.
You can actually do that, I believe, at least with iOS. You can set up an override where you snooze everything else and then you can set up and you have to just put it in your personal contacts, whatever numbers you think you're going to receive critical notification from. And then that'll actually ring through.