David, from 37signals (who I have nothing but respect for), takes me to task for saying people need 99.999% uptime.
Problem is I said nothing of the sort. I simply ranted about companies that have hit popularity and don’t have a damned clue how to keep their systems up once they’re there.
Some people have commented that “adding an extra server might push the problem away for 3-6 months). But we’re talking about services that are doubling every 4-10 weeks. The cost of maintaining that kind of patchwork growth can go from 5K/month to 250K/month very, very quickly.
Which is why designing for scale is so important. I don’t believe any startup “needs” to achieve anything more than around 2 9s of uptime, which is what a properly configured server should do for you. However, even at the beginning, you need to be coding and planning for growth.
Small things like managing how transactions occur, having separate database connections for reading and writing, making your app able to handle variable state sessions, etc are key.
One of the cores of my post was the “ladder to high availability”. Let me repeat it here:
Backups, Redundant, Failover, Cluster, Distributed, Grid and finally Mesh
Your first step should be backups. That way if something goes down, you can have it up again in 20-200 minutes. That’s “acceptable” downtime, as long as it doesn’t happen a lot for most businesses. Redundant should mean that you can bring a new box up in 5-10 minutes. This, obviously, means you need a second box though, which is a doubling of your cost from your original setup.
Failover tends to be more expensive, as it often means having a means of replicating not only data but also static files in near-realtime. But, your downtime with this is in the seconds range (per incident). Clustering drops that downtime further (while adding cost by a factor of 1-3), distribution makes the cost sky high but gets you into that 4 ish 9s of availability, and grid / mesh and other advanced tools will put you over the top.
At no point did I say startups needed to even have backups in place. I simply said if your business requires you to be up, you’d damned well better be up. And if you know from the start of your business that its survival requires you to be up, you’d damned well better be planning for it from the start. Otherwise it will not only be more expensive to add this technology later, it’ll also sideline you for weeks at a time while you redo that technology, bring up test systems, run them in parallel and then do switchovers.
Run your company however you see best, but be forewarned that if your business relies on availability, and the reason you weren’t able to deliver it was because you didn’t think about it or plan for it … Well, your users might just have a nice little revolt.

December 7th, 2005 at 5:58 pm
having separate database connections for reading and writing
It’s the first time I hear about such a thing. Could you provide us(or at least me) with some details ?
December 7th, 2005 at 6:05 pm
Tim, sure, I can provide some brief details.
There are really 2 ways of connecting to a database for an app: either directly or through a “layer” (some would call it a database abstraction layer, others an API, others a webservice, others something else).
If you’re using an abstraction layer of some kind, this point is moot because you can recode that to do this.
But, many of today’s web apps aren’t using anything like this.
What they do is create a connection “template”. Then whenever they need to talk to the database, they pull those details down, or create that object and they talk to it.
There’s nothing wrong with this, in principle. The problem is that not all connections to databases are created equal. On a blog, for example, there might be tens of thousands of “read” requests (ie: someone wanting to see the page), but very few “write” requests (someone making a comment or the author creating a new post).
As a result, one of the ways of creating higher performance, higher availability applications is to make only one database server handle the “writes”. Then you have lots of tinier database servers that handle the “reads”.
If your entire application is built in such a way that changing this would be a major issue, you’re in deep trouble if it ever makes sense to go to the above kind of architecture.
Hope that paints a mildly clearer view of things… There are a lot of ways to do this type of thing, though. Taking care of the simple things up front can help dramatically down the road.
December 8th, 2005 at 9:53 pm
I’m a little bit surprised at the controversy your first post seems to have created. Can people really think it’s a good idea not to plan on scalability? It seems ludicrous.
More over here.