Perhaps we haven't had enough learning or sharing opportunities on the topic (over the last 20 years) but we're more connected than before so I do hope we get to hear every voice.
A little performance monitoring history teaches that in the early days, software performance was primarily focused on functionality rather than service levels. What Google and similarly large players discovered was a shift driven by the need for systems to not only be functional but also reliable, performant and available.
why? increasing complexity of systems (social-technical) and higher user expectations.Do you meet any of the following criteria:
- Are you working on software?
- Running on the modern cloud?
- Do you have two or more stakeholders on your team?
- Do you have or intend to have customers? at least one?
- Do you handle sensitive information on or through your software?
- ...
At any scale, reliability is a everyone's responsibility and I believe an investment in discovering what it takes to start or keep it going is absolutely worth it. I also think reliability (SRE) is greatly misunderstood because we have been reading from the books by Google trying to be Google and when we weren't trying, we didn't experiment enough. btw they are great books and what they started is still a game changer. Shout out to them always.
Comments
Post a Comment