Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Paul Zaich

👤 Person
168 total appearances

Appearances Over Time

Podcast Appearances

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So honestly, a lot of times, you know, maybe what caused the issue from whether if it was something that was specifically run by a specific person and they probably feel a little bit of guilt there, but there's no reason to lay on more there. And I think everyone, like you said, feels a lot of responsibility around the work that they're doing already. So there's no reason to overemphasize that.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So honestly, a lot of times, you know, maybe what caused the issue from whether if it was something that was specifically run by a specific person and they probably feel a little bit of guilt there, but there's no reason to lay on more there. And I think everyone, like you said, feels a lot of responsibility around the work that they're doing already. So there's no reason to overemphasize that.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So what that looks like is typically the team that is impacted is really going to own that postmortem. And that's one way for you to feel like you're resolving the incident or the issue that caused the incident. This has definitely become a bit of a different process as the team is growing.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So what that looks like is typically the team that is impacted is really going to own that postmortem. And that's one way for you to feel like you're resolving the incident or the issue that caused the incident. This has definitely become a bit of a different process as the team is growing.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

When we were at 30, I think it's a little bit easier just to know exactly who should work on those types of mitigations. Typically, it's pretty isolated to a specific team. As the team is growing and the system is growing, that's definitely become more of a challenge because sometimes,

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

When we were at 30, I think it's a little bit easier just to know exactly who should work on those types of mitigations. Typically, it's pretty isolated to a specific team. As the team is growing and the system is growing, that's definitely become more of a challenge because sometimes,

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

happen because different issues that multiple teams have introduced, or maybe there's multiple teams that need to be involved in the mitigation. And for that, in that case, we've definitely been trying to evolve our postmortem process and the action items. So we have a program manager that one of her responsibilities is specifically around making sure that we are coordinating some of those out

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

happen because different issues that multiple teams have introduced, or maybe there's multiple teams that need to be involved in the mitigation. And for that, in that case, we've definitely been trying to evolve our postmortem process and the action items. So we have a program manager that one of her responsibilities is specifically around making sure that we are coordinating some of those out

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

some additional rules and coordination around the process as we've started to grow. A lot of it was just on the individual teams initially, and now as we've grown, again, there's more process involved. I think that's a pretty common thing that you have to introduce as teams grow.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

some additional rules and coordination around the process as we've started to grow. A lot of it was just on the individual teams initially, and now as we've grown, again, there's more process involved. I think that's a pretty common thing that you have to introduce as teams grow.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So we used a number of different types of monitoring. At the time, we were pretty heavily reliant on exception tracking, and we also had some application performance monitoring as well, commonly called APM. A couple examples of that would be something like New Relic or Datadog has a product as well now. And then we did also use a StatsD cluster that sent metrics of over to Datadog.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So we used a number of different types of monitoring. At the time, we were pretty heavily reliant on exception tracking, and we also had some application performance monitoring as well, commonly called APM. A couple examples of that would be something like New Relic or Datadog has a product as well now. And then we did also use a StatsD cluster that sent metrics of over to Datadog.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I think we just had started using that maybe just a few months before this particular incident occurred. So like I alluded to before, we had some monitors for this particular issue, but they were pretty simplistic. They basically just looked for a minimum threshold of the number of reports that we're creating.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I think we just had started using that maybe just a few months before this particular incident occurred. So like I alluded to before, we had some monitors for this particular issue, but they were pretty simplistic. They basically just looked for a minimum threshold of the number of reports that we're creating.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And we had to set that threshold to be very low over like an hour period because traffic is variable. You never know exactly how many reports you're going to get created. There's times a day where we've received very few requests, and then there's other times where we see large spikes. So we just had very simplistic monitoring in place for some of these. key metrics at that point.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And we had to set that threshold to be very low over like an hour period because traffic is variable. You never know exactly how many reports you're going to get created. There's times a day where we've received very few requests, and then there's other times where we see large spikes. So we just had very simplistic monitoring in place for some of these. key metrics at that point.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

At that point, we're still very heavily reliant on, like I said, exception tracking using systems bug trackers like Sentry that then could then alert if you had certain thresholds of number of errors over a period of time. In this particular case, exception tracking isn't very useful because we were responding with a 404. There was an exception in the system.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

At that point, we're still very heavily reliant on, like I said, exception tracking using systems bug trackers like Sentry that then could then alert if you had certain thresholds of number of errors over a period of time. In this particular case, exception tracking isn't very useful because we were responding with a 404. There was an exception in the system.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

It was just automatically active record not found, something like that. that was then handled automatically in response to the 404. So it was an expected behavior, but there wasn't an exception that could have been caught.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

It was just automatically active record not found, something like that. that was then handled automatically in response to the 404. So it was an expected behavior, but there wasn't an exception that could have been caught.