Tuesday, July 19, 2011

The High-Level Test Whisperer

Most teams have High-Level Tests in what they call Functional Tests, Integration Tests, End-to-End Tests, Smoke Tests, User Tests, or something similar. These tests are designed to exercise as much of the application as possible.

I'm a fan of high-level tests; however, back in 2009 I decided on what I considered to be a sweet-spot for high-level testing: a dozen or less. The thing about high-level tests is that they are complicated and highly fragile. It's not uncommon for an unrelated change to break an entire suite of high-level tests. Truthfully, anything related to high-level testing always comes with an implicit "here be dragons".

I've been responsible for my fair share of authoring high-level tests. Despite my best efforts, I've never found a way to write high-level tests that aren't filled with subtle and complicated tweaks. That level of complication only equals heartbreak for teammates that are less familiar with the high-level tests. The issues often arise from concurrency, stubbing external resources, configuration properties, internal state exposure and manipulation, 3rd party components, and everything else that is required to test your application under production-like circumstances.

To make things worse, these tests are your last line of defense. Most issues that these tests would catch are caught by a lower-level test that is better designed to pin-point where the issue originates. The last straw - the vast majority of the times that the tests are broken it is due to the test infrastructure (not an actual flaw in the application) and it takes a significant amount of time to figure out how to fix the infrastructure.

I've thrown away my fair share of high-level tests. Entire suites. Unmaintainable and constantly broken tests due to false negatives simply don't carry their weight. On the other hand I've found plenty of success using high-level tests I've written. For awhile I thought my success with high-level tests came from a combination of my dedication to making them as easy as possible to work with and that I never allowed more than a dozen of them.

I joined a new team back in February. My new team has a bunch of high-level tests - about 50 of them. Consider me concerned. Not long after I started adding new features did I need to dig into the high-level test infrastructure. It's very complicated. Consider me skeptical. Over the next few months I kept reevaluating whether or not I thought they were worth the amount of effort I was putting into them. I polled a few teammates to get their happiness level. After 5 months of working with them I began my attack.

Each time my functional tests broke I spent no more than 5 minutes on my own looking for an obvious issue. If I couldn't find the problem I interrupted Mike, the guy who wrote the majority of the tests and the infrastructure. More often than not Mike was able to tweak the tests quickly and we both moved on. I anticipated that Mike would be able to fix all high-level test related issues relatively quickly; however, I expected he would grow tired of the effort and we would seek a smaller and more manageable high-level test suite.

A few more weeks passed with Mike happily fielding all my high-level test issues. This result started to feel familiar, I had played the exact same role on previous projects. I realized the reason that I had been successful with high-level tests that I had written was likely solely due to the fact that I had written them. The complexity of high-level test infrastructure almost ensures that an individual will be the expert and making changes to that infrastructure are as simple as moving a few variables in their world. On my new team, Mike was that expert. I began calling Mike the High-Level Test Whisperer.

At previous points in my career I might have been revolted by the idea of having an area of the code that required an expert. However, having played that role several times I'm pretty comfortable with the associated risks. Instead of fighting the current state of affairs I decided to embrace the situation. Not only do I grab Mike when any issues arise with the high-level tests, I also ask him to write me failing tests when new ones are appropriate for features I'm working on. It takes him a few minutes to whip up a few scenarios and check them in (commented out). Then I have failing tests that I can work with while implementing a few new features. We get the benefits of high-level tests without the pain.

Obviously this set-up only works if both parties are okay with the division of responsibility. Luckily, Mike and I are both happy with our roles given the existing high-level test suite. In addition, we've added 2 more processes (a 50% increase) since I joined. For both of those processes I've created the high-level tests and the associated infrastructure - and I also deal with any associated maintenance tasks.

This is a specific case of a more general pattern I've been observing recently: if the cost of keeping 2 people educated on a piece of technology is higher than the benefit, don't do it - bus risk be damned.


  1. It's great to see a post on this topic. I use the term "high level tests" to mean something totally different, but our team has had a suite of what you call "high level tests" and we call "smoke tests" (though technically, I think they are not smoke tests either) that we started in 2004. Initially, when we had no automated tests, this was our only line of defense against regression failures. Now we have good coverage with 5500+ JUnits and 500+ FitNesse test pages, and the GUI tests are our last line of defense, as you say.

    These test scripts are mostly happy path, they do a few negative tests such as verifying required fields. Yet, they have caught many serious bugs over the years that couldn't be detected at lower levels, especially in our "legacy" or old code. They've been fairly cheap to maintain, because they have little logic, and we designed them to extract out duplication into modules. Great ROI. I can think of two severe regressions that made it into production over the past 7 years, both were edge cases that we consciously decided not to have an automated regression test for (oops).

  2. Always great to hear about other people's experiences. Thanks for sharing Lisa!

    Cheers, Jay


Note: Only a member of this blog may post a comment.