Monday, June 11, 2007

Testing: Inline Setup

In order to have an effective test suite, you must have a maintainable test suite. Only if the test suite is maintainable will all members of the team take ownership and ensure the tests are performing as expected.

Far too often I see the following behavior
  1. Run test suite before checking in.
  2. A test for part of the system you are not working on fails.
  3. Find the failing test, but don't take the time to understand the test.
  4. Determine from the failure what the issue is and change a value.
  5. Move on because the test is now passing, never knowing if the test still verifies what was originally intended.
You can claim that the above scenario only happens when you have weak developers. You'd be wrong. I work with some of the best developers in the industry and it still happens.

I agree that part of the blame is on the developer who doesn't take the time to understand the test; however, more of the blame is on the code's original author, who didn't give the new developer a fighting chance.

The original author didn't ask the right question: How can I make this code as maintainable as possible? In the previous sentence, maintainable is the keyword. Maintainable doesn't always mean DRY, abstracted, or object oriented. When it comes to tests, maintainability means: When the test breaks, what can it contain that will make it easiest to fix.

The reason the original steps were important to this entry is because they show the most common usage of tests: You've written/changed code that caused unexpected test failures. In this scenario your workflow is very simple, find the broken test and make it work. Therefore, when you get to the test, which of the following would you prefer to find?

test "when attribute is not nil or empty, then valid is true" do
validation = Validatable::ValidatesPresenceOf.new stub, :name
assert_equal true, validation.valid?(stub(:name=>"book"))
end

test "when attribute is not nil or empty, then valid is true" do
assert_equal true, @validation.valid?(@stub)
end

The first test contains all the information necessary to understand the intent and implementation of the test.

On the other hand, the second test pulled the creation of the validation instance and the stub into a setup method. While the setup method can create a DRYer test, it has reduced your ability to maintain this test. If the test file happens to be long, this problem is magnified by the fact that it may not be immediately obvious that a setup method is even being called. Even worse, if you don't notice that the setup method exists you may see unexpected behavior.

Another problem with setup is that it doesn't fail if unnecessary behavior creeps into the setup method. For example, the original author of a test file creates 3 test methods that all have common setup. He moves this behavior to the setup method and checks in. The next author needs to write some tests that use some of the shared behavior but not all. The correct thing to do is move the uncommon behavior to the original 3 tests; however, in practice I've never seen this happen. Again, you can argue that it's the developers fault; however, do you ever remember doing what I'm describing?

Of course, the above scenario assumes you even know that a setup method is executing. It's very possible to open a test file to create a new test and the file is over 80 lines long. In that case it's common to create a new test without noticing that a setup is even being executed. Now, since your test isn't taking advantage of the setup, the code from the setup is no longer common to all tests, and maintainability has again been lost.

Worse, I once opened a test that had a setup method and several tests that depended on it. Additionally, the test file also contained another group of tests that used a private method to initialize a few objects. The logic from the setup and the private method was almost exactly the same. Clearly, the two different authors didn't notice their differing implementations. Ironically, in their attempt to be DRY they created a greatly inferior alternative: Duplicate code that's not easy to follow.

What does all this mean? It means that when writing tests you should think about them in isolation. They run in isolation and the chances are when they break (if you write robust tests) they will be fixed in isolation; therefore, abstracting anything that is 'common' actually hurts maintainability. With every test, pretend it's the first and only test you are writing in that class. If that were true, it would be hard to justify those setup or private helper methods.

A question that always comes up: What if my test requires a large amount of set up code? This is often a clear sign that the domain model of the application is broken. Another explanation I've seen is, all the setup isn't actually required if you stick to testing one thing at a time. This is another reason that using one assertion per test is valuable.

Another question that always comes up: What about writing DRY code? For me, the value behind DRY has never been typing something once, but that when I want to make a change I only have to make it in one place. Since I'm advocating the duplication in a test file, you still only need to make the change in one place. You may need to do a find and replace instead of changing one helper method, but the other people who are stuck maintaining your tests will be very happy that you did.

14 comments:

  1. Sometimes when I read your posts I feel like chanting "His name is Robert Paulsen". It's helpful to think about DRY in the way you've outlined, it's that testing, in a sense, is a similar but parallel world of development.

    ReplyDelete
  2. If the test file is too long? If the programmer knows the setup method is even executing? These are excuses, not reasons.

    If your test file is too long, you've got an organizational issue with your tests. It's time to start thinking about splitting out your tests into more manageable chunks. If I have a deep class, I break the tests down into *one file per method*, instead of jamming everything into one monster test file. I've found that keeps things much more mentally manageable.

    As for "not noticing the setup method is being executed", that is, quite frankly, a pathetic excuse. I don't mean to sound harsh, but you and your colleagues are professional programmers. If a Ruby programmer doesn't immediately recognize an instance variable within a test case as something that's created in a setup method then they have absolutely no business writing tests in the first place. Knowing your tools is a fundamental requirement of using them.

    I tend to believe that you're thinking in terms of _Rails_ testing, here, though. Because, for example, when I write C extensions I often have to write setup code for cross-platform issues. Typically, this means determining up front whether I'm on MS Windows or not. Sticking that logic in every test would be absurd. I could also go on about how, if you were testing IO objects, there's a good chance your colleagues are going to run your system out of file descriptors at some point, because you didn't have a common teardown method that would do it automatically for you.

    Now, where I think your article _does_ have merit is, in projects with multiple developers, we must realize that we're not the only ones looking at the test. Our colleagues must understand them as well. I just happen to disagree with the _how_.

    As for DRYing up your tests, well, after writing over 5,000 tests for an independent Ruby test suite, headed towards 20,000, I can tell you straight up that using a setup method goes a long way towards preserving your wrists...and your sanity. :)

    ReplyDelete
  3. Anonymous3:53 PM

    Daniel,

    Regardless of whether the file is long or if you use instance variables, setup adds additional complexity to a test file. There's no getting around the fact that extracting a method adds complexity. That's the switch that it's important to make. Instead of thinking in terms of the test suite, think in terms of one test at a time. If your entire application only needed to be 5 lines, would you extract a method? I wouldn't.

    I can appreciate that we simply disagree on style, but the numbers don't sway my opinion. Both teams from my recent projects were 16 developers large and the assertions were nearing 5k and nearing 7k respectively.

    I think it's fair to say that the larger the team, the less likely it is that you will be working on your own tests, and the more important it is that there's 0 indirection to follow when fixing a test. Then again, if your team is small enough that you generally work with your own tests, a bit of magic should benefit you.

    btw, people much smarter than me, such as Dan North, Zak Tamsen, and Christian Taubman led me down this path. It's becoming more and more common as testing practices mature.

    Cheers, Jay

    ReplyDelete
  4. Anonymous3:59 PM

    Also, I think you're missing part of the point. It's not that I can't look for a setup method it's that I don't want to have to look for a setup method. Not when I'm creating a new test or looking at one I just found failing. I want to read the test, make the necessary changes and move on. Any additional steps make me less effective.

    Cheers, Jay

    ReplyDelete
  5. This is one case were I think specifications win over tests. By grouping specs by context, where that context is setup in one place, it's much clearer what is being setup and what's being tested/specified. Sure, you can do the same with tests, but specifications are asking for it.

    (Better English this time.)

    ReplyDelete
  6. Jay,

    I agree with your post, though it seems to also point to a fundamental framework flaw. Test/Unit is almost forcing you to do per-test-setup, whereas RSpec allows for contexts with localized setup and teardown (what they call before and after). I think this goes a long way to remedying the problem you pointed out without all the need for duplication. Separate your testing concerns.

    In my mind, there is no point "setting up" a new phone number four separate times for testing area code, exchange, station and extension initialization. Create an initialization context, a localized setup that initializes the number and then four separate tests. Voila!

    Of course, that is a simple case, but I'm finally beginning to see how contexts make sense in a testing scenario. Hey, I'm slow and it only took a year to digest BDD and get bitten by the bug!

    Cheers,
    Kevin

    ReplyDelete
  7. Jay,

    Ah, ok, your original post made it sound like (to me, anyway) that the other testers didn't understand the purpose of a setup method and/or weren't familiar with testunit in general. My apologies.

    As for RSpec, well, maybe that is a solution, but it's not one I've come around to personally yet. :)

    ReplyDelete
  8. Anonymous6:22 PM

    Daniel,

    Thanks, as always, for your comments. At the end of the day, we all have to make adjustments for our environment. While I think my plan scales well, I've never coded a day in your shoes. I've also never written a C extension. =)

    Thanks for your point of view.

    Cheers, Jay

    ReplyDelete
  9. "You can claim that the above scenario only happens when you have weak developers. You'd be wrong. I work with some of the best developers in the industry and it still happens."

    Hmmm.

    Bad developers don't properly read the tests.

    It happens at ThoughtWorks

    Therefore....

    Two possible options there. Don't always believe your own press, dude. :-)

    Alan
    Disclaimer: I worked for TW for 4 years.

    ReplyDelete
  10. Anonymous10:25 AM

    Hello Alan,

    I didn't say: Bad developers don't properly read the tests.

    I don't believe that either. I think everyone at some point has needed to check in for one reason or another and stumbled upon a test that they couldn't make heads or tails of. In that case, I don't think it makes you a bad developer if you get the test working so you can check in and then refactor the test after you've gotten checked in.

    However, I'm trying to avoid that situation entirely by getting the tests to a maintainable state in the first place.

    Cheers, Jay

    ReplyDelete
  11. Anonymous3:05 AM

    Hi Jay,

    apologies... s/bad/weak/g

    Alan

    ReplyDelete
  12. Anonymous10:36 AM

    Hello Alan,

    Sometimes it's tough to your your point across via comments. =)

    I not trying to point out whether or not you misquoted me. I was trying to say that I don't believe only weak/bad developers don't read tests.

    I think there can be other reasons that a good developer doesn't read a test, such as having a massive refactoring that started as a small change and went down a rabbit hole. At that point, I just want to get checked in and am not worrying about continuing to refactor bad tests I stumble upon.

    Thanks for the comments all the same.

    Cheers, Jay

    ReplyDelete
  13. Anonymous4:54 AM

    Hi Jay,

    Yes, comment conversation is like trying to write Haiku :-)

    I see what you're saying, and yet I still disagree. :-) I think that a 'good' developer won't rush to check in.

    Defining 'good' of course, is a whole other conversation. Ward (of course) articulates it better than anyone has so far, I think: http://eclipse-projects.blogspot.com/2007/04/is-agile-too-easy.html

    ReplyDelete
  14. Anonymous4:33 AM

    >>There's no getting around the fact that extracting a method adds complexity.

    Hmmm... I don't think this is true in general.

    Reusing code by extracting a method can both increase or decrease complexity depending on the circumstances.

    If the method increases the cohesion and decreases the coupling of the code then it is decreasing complexity.

    If it increases coupling and decreases cohesion then it is increasing complexity.

    Note that decreasing cohesion is what you are refering to when you say that complexity has increased in this case.

    Jonathan.

    ReplyDelete

Note: Only a member of this blog may post a comment.