Ian penned a nice little follow up to this scalability / reliability / availability / uptime issue: What is 99% Uptime Anyway?
In it he gives a brief view of what different “9s” of uptime means:
Uptime Time lost in a year
98% 7.3 days
99.0% 3.7 days
99.9% 8 hours
99.99% 1 hour
99.999% 5 minutes
And he goes on to say that “uptime” is such a crappy metric. It doesn’t take into account responsiveness, dropped pages, when something was down, etc. He proposes a new metric for companies, one that I totally agree with:
I think companies should define a metric more along the lines of: the time take to complete XXXX operation, between the hours 9AM and 9PM. and then combine these timings into a weighted average. The weights being how important that operation is to your core business.
measure & monitor that. not uptime.
If some of the companies I’ve worked with recently had done the above, you can bet their average score would have been incredibly low - even though their “real uptime” would have been fairly high.
After all, Google Analytics was “up” during its recent rush. Servers were responding, pages were being served, stats were being tracked… But, it wasn’t very responsive or useful.
Good job Ian.
#1 by Ian Holsman - December 7th, 2005 at 18:33
Thanks!
oh.. I also updated the posting to mention http://grabperf.org which does some of this. It has a good example of how you could do this in your own company.
#2 by Stephen Pierzchala - December 7th, 2005 at 18:54
I do Web performance analysis for a living
There are REALLY big companies out there that cannot achieve 5 9’s 24/7/365.
As Ian hits on, you need to be up when it matters most. The cost of achieving 5 9’s is so high, that unless you have the budget of the US Defense Department, and an IT staff of thousands, you won’t EVER achieve it.
And even then, you won’t achieve it.
Because the you are sending data across the Internet. And, well, stuff happens on the Internet.
smp
#3 by /pd - December 8th, 2005 at 13:26
uptime is 9-niners !! :)- even for DOD..
#4 by Abe - December 8th, 2005 at 18:19
ThePlanet guarantees 100% network uptime. Servermatrix guarantees 99.99% “network” uptime. Since you’re server is on SM, you really can’t guarantee anything better than 99.99%. Add to that some regular cPanel updates, Apache recompiles and other patches/fixes that might need you to reboot the server or at least restart a service, you can only go so much with the percentage points.