Original link: https://zdyxry.github.io/2023/04/09/Weekly-Issue-2023-04-09/
Article link to titleTechnical link to titleRead Every Single Error | Pulumi Blog
Manpower is limited to read each error log. If the error rate cannot be continuously increased, then with the increase of API calls, the Oncall energy is doomed to be insufficient: (API Call Volume) * (Error Rate) * (Time to Triage an Error) < On-Call Attention. SRE models and error budgets are high fashion, and it’s not always beneficial to skip steps and go straight to the advanced stages. Instead, it’s best to use the right-sized tools and processes.
Sure you can hire and split systems out into separate on-call rotations to increase capacity, but our goal is to scale exponentially with respect to humans, not linearly!
Quote from a friend:
The core is that someone is staring at this matter. The number of our internal sentry alerts has changed from more than 300 per minute last year to more than 20 this year. It is to pull out the top5 projects with the most abnormal errors every week, and let the engineers in charge of the projects see what is going on. thing.
This article is transferred from: https://zdyxry.github.io/2023/04/09/Weekly-Issue-2023-04-09/
This site is only for collection, and the copyright belongs to the original author.