Archive for December, 2005
A number of folk have been responding on their blogs that features and users are more important than infrastructure requirements. Some of these have felt I was espousing spending gobs of money on infrastructure up front, which I wasn’t.
Ian Holsman says it really well in one of his comments here, though:
the point I was trying to make is that you shouldn’t worry about scaling until you need to, and for some cases that is never.
The reasoning behind not worrying about scaling is that in a lot of cases people worry about the wrong things. They will spend hours getting that code tuned just so, and have it running in 10ms less time, only not to realize that the code is only run once a day.
the other reason is some features don’t work out, for example writing the mousetrap your marketing person thinks will have people beating down the path to your door. now isn’t better to get that mousetrap up first, and see if he is right (in a day) as opposed to writing it with scalability in mind, waiting a month only to find that it isn’t used? even if he is right, you will have been validated in the day, and can then proceed to write a scalable solution while people are using the feature on the site!
you should wait until the problem occurs, and then address scalability. i’m not saying ignore scalability, just to address it when it is about to become an issue, and concentrating on growth and income generation. I’m not saying not to worry about it, just not to implement it until it is needed.
I actually agree with him. I just dont’ want companies thinking they are scalable or spending months retooling their apps just because they werent’ thinking about things up front is all :)
David, from 37signals (who I have nothing but respect for), takes me to task for saying people need 99.999% uptime.
Problem is I said nothing of the sort. I simply ranted about companies that have hit popularity and don’t have a damned clue how to keep their systems up once they’re there.
Some people have commented that “adding an extra server might push the problem away for 3-6 months). But we’re talking about services that are doubling every 4-10 weeks. The cost of maintaining that kind of patchwork growth can go from 5K/month to 250K/month very, very quickly.
Which is why designing for scale is so important. I don’t believe any startup “needs” to achieve anything more than around 2 9s of uptime, which is what a properly configured server should do for you. However, even at the beginning, you need to be coding and planning for growth.
Small things like managing how transactions occur, having separate database connections for reading and writing, making your app able to handle variable state sessions, etc are key.
One of the cores of my post was the “ladder to high availability”. Let me repeat it here:
Backups, Redundant, Failover, Cluster, Distributed, Grid and finally Mesh
Your first step should be backups. That way if something goes down, you can have it up again in 20-200 minutes. That’s “acceptable” downtime, as long as it doesn’t happen a lot for most businesses. Redundant should mean that you can bring a new box up in 5-10 minutes. This, obviously, means you need a second box though, which is a doubling of your cost from your original setup.
Failover tends to be more expensive, as it often means having a means of replicating not only data but also static files in near-realtime. But, your downtime with this is in the seconds range (per incident). Clustering drops that downtime further (while adding cost by a factor of 1-3), distribution makes the cost sky high but gets you into that 4 ish 9s of availability, and grid / mesh and other advanced tools will put you over the top.
At no point did I say startups needed to even have backups in place. I simply said if your business requires you to be up, you’d damned well better be up. And if you know from the start of your business that its survival requires you to be up, you’d damned well better be planning for it from the start. Otherwise it will not only be more expensive to add this technology later, it’ll also sideline you for weeks at a time while you redo that technology, bring up test systems, run them in parallel and then do switchovers.
Run your company however you see best, but be forewarned that if your business relies on availability, and the reason you weren’t able to deliver it was because you didn’t think about it or plan for it … Well, your users might just have a nice little revolt.
I guess I’ve been spoiled the last few months in that my light blogging has meant I haven’t pissed anyone off or done anything crazy.
It used to be that it happened every other month or so that I really stuck my foot in something. I thought I’d smartened up. I guess not.
Ah well, live and learn. But what if you aren’t learning? ;-)
I wonder if the issue isn’t the time of year. After all, last year at this time I caused a bit of a hubbub too, eh?
Gotta love the holiday season ;-)
After ragging on “Web 2.0″ companies for not managing their infrastructure and growth properly, I put up a brief note acknowledging that we were having uptime issues.
By “issues” I meant that average uptime for the month had slipped below 99.5% (just). Truth be told, it was the highest amount of uptime we’d had since we started, but that’s besides the point. The post has brought the naysayers out of the woodwork.
I figured I’d acknowledge this, since it’s both hurtful and funny. I’ve been called all manner of swear words today, been told I should be fired, been told I don’t deserve to ever run a company ever again and had a private email saying if the individual was an investor they’d “chop off my legs” (I’m guessing they’re not a VC-style investor but some other, less savory, kind).
Here are a few of the posts about this:
Had the sudden realization today that I’d turned into one of those people I always hated. The kind that tells you you’ve made the wrong decision.
My core point of my rant was right. Any company which is a service needs to not only be designed to scale, coded to scale and ready to scale… They also need to scale when the time is right OR they need to stop new users from signing up so that current users aren’t affected.
But, the reality is that I shouldn’t have pointed at FeedLounge. After all, as Scott shows, they made conscious decisions about scaleability. And that’s about all anyone can ask.
It’s not like I’m even one of their users, who’s been put off by their choices. If I was I might have some reason to complain.
I have always, always, always felt that it was never fair for people outside a company to criticize the decisions made inside the company. I think it’s fair play when a company doesn’t make a decision, or is ignorant about something. But when a company is aware of something, makes a choices and sticks by it, it’s really hard to actually say it’s the wrong one when from the outside we dont’ know all the factors involved.
So while I stand by my point that Web 2.0 companies need to be architected from the ground up with scaleability in mind… I can respect it when a company makes a business choice to focus on the finish line first and foremost, knowing that there’ll be growing pains along the way.
Which, really, is what it sounds like FeedLounge has done. Which is good ;-)