Monday, June 16, 2008

Immaturity of Developer Testing

The ThoughtWorks UK AwayDay was last Saturday. You could over-simplify it as an internal conference with some focus on technology, and extra emphasis on fun. At the last minute one of the presenters cancelled so George Malamidis, Danilo Sato, and I put together a quick session -- Immaturity of Developer Testing.

It's no secret that I'm passionate about testing. The same is true of Danilo and Georege, and several of our colleagues. We thought it would be fun to get everyone in a room, argue a bit about testing, and then bring it all together by pointing out that answers are contextual and the current solutions aren't quite as mature as they are often portrayed. To encourage everyone to speak up and increase the level of honesty we also brought a full bottle of scotch.

We put together 5 sections of content, but we only managed to make it through the first section in our time slot. I'll probably post the other 4 sections in subsequent blog posts, but this entry will focus on the high level topics from the talk and the ideas presented by the audience.

Everyone largely agreed that tests are generally treated as second class citizens. We also noted that test technical debt is rarely addressed as diligently as application technical debt is. In addition, problems with tests are often handled by creating band-aids such as your own test case subclass that hides an underlying problem, testing frameworks that run tests in parallel, etc. To be clear, running tests in parallel is a good thing. However, if you have a long running build because of underlying issues and you solve it by running the tests in parallel.. that's a band-aid, not a solution. The poorly written tests may take 10 minutes right now. If you run the tests in parallel it might take 2 minutes today, but when you are back to 10 minutes you now have ~5 times as many problematic tests. That's not a good position to be in. Don't hide problems with abstractions or hardware, tests are as important as application code.

Another (slightly controversial) topic was the goal of testing. George likes the goal of confidence. I love the confidence that tests give me, but I prefer to focus on Return On Investment (ROI). I think George and I agree in principle, but articulate it differently. We both think that as an industry we've lost a bit of focus. One hundred percent test coverage isn't a valuable goal. Instead it's important to test the code that provides the most business value. Test code must be maintained; therefore, you can't always afford to test everything. Even if you could, no automated test suite can ever replace exploratory testing. Often there are tests that are so problematic that it's not valuable to automate them.

The talk was built on the idea that context is king when talking about testing, but it quickly devolved into people advocating for their favorite frameworks or patterns. I ended up taking a side also, in an attempt to show that it's not as easy as right and wrong. I knew the point of view that some of the audience was taking, but I didn't get the impression that they were accepting the other point of view. We probably spent too much time on a few details, of course, the scotch probably had something to do with that.

I wish we could have gotten back on track, but we ended up running out of time. After the talk several people said they enjoyed it quite a bit, and a few people said they began to see the opposing points of view. I think it was a good thing overall, but it's also clear to me that some people still think there are absolute correct and incorrect answers... which is a shame.

Next up, pro and con lists for browser based testing tools, XUnit, anonymous tests, behavior driven development, and synthesized testing.

10 comments:

  1. Jay, I have a passion for testing myself which is why I happily read your blog. I think with the past few articles you've really nailed the issue on the head in that the right way to test is contextual and undefinable for every circumstance and situation. For certain aspects of testing I find I really agree with you on some aspects, the strongest being test names as glorified comments, and other times I find that some of the things you've written about are just impractical for my use such as splitting up a rails test suite. However, I'm not saying you're wrong, context is king, I'm working in a much different situation than you are, at most I'm working with two other developers with rails projects much smaller than the ones you are probably working with.

    When context is so important it makes the possibility of learning the best possible way to test in all circumstances virtually impossible there are just too many permutations. I think the the thing to do in circumstances like this is to try and make particular test implementations derivable from testing principles where given a certain testing situation multiple testing implementations can be derived that satisfy higher order testing principles. A testing implementation decision could be using fixtures versus using mocks, a testing principal decision to decide which implementation to use could be the relative value of integration versus isolation. Have we identified what the principals of testing are? I'd like say I know them, but your disagreement with George with regards to the goals of testing illustrates that even the principals might be hard to define. I tend to favor George's notion of confidence.

    In accounting they have GAAP principles (Principle of regularity, principle of sincerity, principle of continuity). Does such a thing exist for testing? Could it be possible to identify principles?

    Principle of coverage?

    Principle of isolation?

    Principle of integration?

    Principle of communication?

    ReplyDelete
  2. Hi Jay,

    Some excellent comments. It's very encouraging your attempting to get a conversation started internally at Thoughtworks.

    This relates to a question I asked on my blog - where are all the testers?

    http://blog.benhall.me.uk/2008/06/community-call-to-action-where-are-all.html

    Test technical debt is something I always try to address, but it is a big problem.

    I think this is more of a problem when ensuring a good design of your test harness and following the same principals as you would on customer software. I've saw many test harnesses which are in really bad shape because no-one paid attention to the design, readability and reliability. As such, a lot of effort was still placed on manual testing because the automation doesn't provide any confidence in the software.

    Educating a team to understand that tests are as important as application code is really important and I feel a lot of teams still haven't fully understood this. Not sure on the best way to do this? This comes back to my comment on my blog about getting the conversation and starting to introduce best practices. Starting to openly discuss best ways to structure your tests and communicate ideas in order for everyone to produce better software.

    On the subject of code coverage. It would be much more useful if there was a metric associated with features \ business value, a test would count as part of the coverage of a feature - not of the code. This would provide a much more useful insight into the test suite, you could have a list of features each with a test coverage for how much is automated. Identifying the sections which wasn't automated would be much similar. From a testers point of view, 100% test coverage doesn't mean as much in terms of customer acceptance.

    It's a shame I can't get involved in the Thoughtworks discussions :) The topics you mentioned are similar to the list I have next to me about what I want to research.

    ReplyDelete
  3. It was a fun discussion that just as you say made me see the other point of view, in this case yours :).

    ReplyDelete
  4. Tim,
    The principle approach might be a good way to go.

    I've currently been approaching the situation in a 2 phase approach
    - acknowledge the different patterns in techniques and work styles
    - associate the patterns with the contexts in which they are the most valuable

    I expect it's going to take time (years), but the industry will be better for it in the end... Hopefully. =)

    ReplyDelete
  5. Ben,
    It sounds like we share a lot of common views. I think as an industry we are headed in the right direction, just very, very slowly.

    When reading your post, Penopticode came to mind. It's only for Java at this point, but it uses several different variables to give a visualization of where problem code might exist. I'd love to see the same thing for Ruby and C#, maybe it's in the future.

    I also think there are several reasons that more developers don't adopt testing. That sounds like an interesting blog post, I'll see what I can put together in the next few days.

    Thanks for your comment.

    Cheers, Jay

    ReplyDelete
  6. Jay,

    The term 'Penopticode', not come across the term before - neither has Google :)

    Have you any links on this?

    Thanks

    Ben

    ReplyDelete
  7. Sorry, spelling mistake

    http://www.panopticode.org/

    Cheers, Jay

    ReplyDelete
  8. Jay,

    It was really interesting to read your thoughts on running tests in parallel, mostly because my startup is currently working on an easy-to-use distributed test runner for Ruby.

    It got me thinking: is our product just a band-aid for bad test suites? And if so, how can we address the root problem?

    http://devver.net/blog/2008/06/are-we-a-band-aid/

    I'd love to hear thoughts on what kinds of tools might help teams diagnose their test problems (I'll look into Panopticode, I hadn't heard of it before).

    Ben

    ReplyDelete
  9. Hi Ben,
    I think your blog post was good, but it makes a bit assumption: developers would actually know if they were writing bad tests. I think the large majority of developers think they are writing good tests, as they slowly paint themselves into a horrible test suite corner.

    As far as metrics, I've always been lazy in that area. I find them helpful, but not as helpful as the feeling I get while working with the tests. However, I know that pattern doesn't scale.

    I like what Jake has done with http://metric-fu.rubyforge.org/. I really like what Panopticode. I would pay for a solution like that for ruby. Of course, I'm not sure how much. I guess it depends on how useful I found it to be.

    Cheers, Jay

    ReplyDelete
  10. Jay,

    I think you're right - often developers don't see the problems with their tests. I haven't figured it out completely, but one way to fix this is probably a mix of metric-based tools to show them the problems (the 'what' and 'where') coupled with blog posts like yours that explain and advocate best practices (the 'why').

    I'll check out metric-fu. Looks cool.

    ReplyDelete

Note: Only a member of this blog may post a comment.