Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Paul Zaich

👤 Person
168 total appearances

Appearances Over Time

Podcast Appearances

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

I think this was more of a monitoring problem overall. As Dave mentioned, there was a component where a page was snoozed, but I think that was still a failure on our monitoring system. Because in this case, that was just a signal of what the true issue was. It was a downstream client application that had a page earlier on. And it wasn't clear at all what the issue was.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

I think this was more of a monitoring problem overall. As Dave mentioned, there was a component where a page was snoozed, but I think that was still a failure on our monitoring system. Because in this case, that was just a signal of what the true issue was. It was a downstream client application that had a page earlier on. And it wasn't clear at all what the issue was.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I think when you're developing a system for alerting clients, you need to have clear action items. So you need to have, and that's for custom metrics, building application metrics as you grow, become really important. Having clear signal of what's wrong so that someone knows where to investigate. In this case, it was a client application and browser. There's a lot of noise there.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I think when you're developing a system for alerting clients, you need to have clear action items. So you need to have, and that's for custom metrics, building application metrics as you grow, become really important. Having clear signal of what's wrong so that someone knows where to investigate. In this case, it was a client application and browser. There's a lot of noise there.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I can easily understand why someone would just snooze something like that. In my opinion, it wasn't really a people issue in this particular case.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I can easily understand why someone would just snooze something like that. In my opinion, it wasn't really a people issue in this particular case.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

It also depends on where you are in terms of your applications, use cases, what the customer profile looks like, how large the company has gotten, how many people are supporting it. When you're early on, when you're building a new application, new product. By definition, the developers on that are going to really understand the whole system very well.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

It also depends on where you are in terms of your applications, use cases, what the customer profile looks like, how large the company has gotten, how many people are supporting it. When you're early on, when you're building a new application, new product. By definition, the developers on that are going to really understand the whole system very well.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So essentially, exception tracking probably is going to be able to give you most of what you need to know in terms of being able to understand what's going on. As the system starts to grow, and especially as you have more discrete teams, I think that's where things like StatsD become more useful because use cases for core parts of your application.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

So essentially, exception tracking probably is going to be able to give you most of what you need to know in terms of being able to understand what's going on. As the system starts to grow, and especially as you have more discrete teams, I think that's where things like StatsD become more useful because use cases for core parts of your application.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I would maybe say that the bar there is maybe when you start to hit the point where you start to have a significant number of paying customers using specific features, maybe you need to start to hone in on one or two key processes that they break. It's absolutely critical that you know immediately. That's kind of the point that Checkr is at in 2017. We really need to have high intelligence...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

And I would maybe say that the bar there is maybe when you start to hit the point where you start to have a significant number of paying customers using specific features, maybe you need to start to hone in on one or two key processes that they break. It's absolutely critical that you know immediately. That's kind of the point that Checkr is at in 2017. We really need to have high intelligence...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

very clear intelligence and visibility into specific parts of our system. And we're trying to move in that direction when this incident happened. We've continued to invest in that area going forward. I think it's become even more important as we're getting larger because there's just...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

very clear intelligence and visibility into specific parts of our system. And we're trying to move in that direction when this incident happened. We've continued to invest in that area going forward. I think it's become even more important as we're getting larger because there's just...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

so many different systems that are interacting together that no one really understands the whole system at this point. And the only way to really know how the different systems are working together is maybe make sure everything's working properly is to have some of these custom metrics defined for specific key processes.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

so many different systems that are interacting together that no one really understands the whole system at this point. And the only way to really know how the different systems are working together is maybe make sure everything's working properly is to have some of these custom metrics defined for specific key processes.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

That's a good question. We're all remote now. So at this point, having had to experiment with that, we did have some of those in our office. I think I've been trying to find ways to make that more visible and make metrics more visible to our team as we've been shifted to 100% remote due to the pandemic. There's also a challenge for our business in particular where...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

That's a good question. We're all remote now. So at this point, having had to experiment with that, we did have some of those in our office. I think I've been trying to find ways to make that more visible and make metrics more visible to our team as we've been shifted to 100% remote due to the pandemic. There's also a challenge for our business in particular where...

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

Sometimes things are very, many of our processes are very asynchronous and they could take hours to date to fully execute. And so finding ways to short circuit and know that those things are broken can be challenging at times. So one of the things we have to do is we have to look at the data over time as well and not just look at real time metrics.

Ruby Rogues
The Sounds of Silence: Lessons From an API Outage with Paul Zaich - RUBY 652

Sometimes things are very, many of our processes are very asynchronous and they could take hours to date to fully execute. And so finding ways to short circuit and know that those things are broken can be challenging at times. So one of the things we have to do is we have to look at the data over time as well and not just look at real time metrics.