Tuesday, July 12, 2011

Undervalued Start and Restart Related Questions

How long does it take to start or restart your application?

Start-up time tends to be a concern that's often overlooked by programmers who write unit tests. It will (likely) always be faster to run a few unit tests than start an application; however, having unit tests shouldn't take the place of actually firing up the application and spot checking with a bit of clicking around. Both efforts are good; however, I believe the combination of both efforts is a case where the sum is greater than the parts.

My current team made start-up time a priority. Currently we are able to launch our entire stack (currently 6 processes) and start using the software within 10 seconds. Ten seconds is fast, but I have been annoyed with it at times. I'll probably try to cut it down to 5 seconds at some point in the near future, depending on the level of effort needed to achieve a sub-5-second start-up.

That effort is really the largest blocker for most teams. The problem is, often it's not clear what's causing start up to take so long. Performance tuning start-up isn't exactly sexy work. However, if you start your app often, the investment can quickly pay dividends. For my team, we found the largest wins by caching remote data on our local boxes and deferring creating complex models while running on development machines. Those two simple tweaks turn a 1.5 minute start-up time into 10 seconds.

If your long start-up isn't bothering you because you don't do it very often, I'll have to re-emphasize that you are probably missing out on some valuable feedback.

Not time related, but start related: Does your application encounter data-loss if it's restarted?

In the past I've worked on teams where frequent daily roll-outs were common. There are two types of these teams I've encountered. Some teams do several same day roll-outs to get new features into production as fast as possible. Other teams end up doing multiple intraday rollouts to fix newly found bugs in production. Regardless of the driving force, I've found that those teams can stop and start their servers quickly and without any information loss.

My current team has software stable enough that we almost never roll out intraday due to a bug. We also have uptime demands that mean new features are almost never more valuable than not stopping the software intraday. I can only remember doing 2 intraday restarts across 30 processes since February.

There's nothing wrong with our situation; however, we don't optimize for intraday restarts. As part of not prioritizing intraday restart related tasks, we've never addressed a bit of data-loss that occurs on a restart. It's traditionally been believed that the data wasn't very important (nice-to-have, if you will). However, the other day I wanted to rollout a new feature in the morning - before our "day" began. One of our customers stopped me from rolling out the software because he didn't want to lose the (previously believed nice-to-have) overnight data.

That was the moment that drove home the fact that even in our circumstances we needed to be able to roll out new software as seamlessly as possible. Even if mid-day rollouts are rare, any problems that a mid-day rollout creates will make it less likely that you can do a mid-day rollout when that rare moment occurs.

Tests and daily rollouts are nice, but if your team is looking to move from good to great I would recommend a non-zero amount of actual application usage from the user's point of view and fixing any issues that are road-blocks to multiple intraday rollouts.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.