Showing posts with label testing. Show all posts

Tuesday, June 28, 2016

Curious Customer

I currently work on a pretty small team, 4 devs (including myself). We have no one dedicated strictly to QA. A few years ago we ran into a few unexpected issues with our software. I hesitate to call them bugs, because they only appeared when you did things that made little sense. We write internal-only software, thus we expect a minimum level of competency from our users. In addition, it's tempting justify ignoring problematic nonsensical behavior in the name of not having to write and maintain additional software.

But, when I wasn't in denial, I was willing to admit that these were in fact bugs and they were costing us time.

The problems caused by these bugs were small, e.g. a burst of worthless emails, a blip in data flowing to the application. The emails could be quickly deleted, and the application was eventually consistent. Thus I pretended as though these issues were of low importance, and that the pain was low for both myself and our customers. I imagine that sounds foolish; in retrospect, it was foolish. The cost of developer context switching is often very high, higher if it's done as an interrupt. Introducing noise into your error reporting devalues your error reporting. Users can't as easily differentiate between good data, a blip of bad data due to something they did, and actual bad data, thus they begin to distrust all of the data.

The cost of these bugs created by nonsensical behavior is high, much higher than the cost of writing and maintaining the software that eliminated these bugs.

Once we eliminated these bugs, I spent notably more time happily focused on writing software. For me, delivering features is satisfying; conversely, tracking down issues stemming from nonsensical behavior always feels like a painfully inefficient task. I became very intent on avoiding that inefficiency in the future. The team brainstormed on how to address this behavior, and honestly we came up with very little. We already write unit tests, load tests, and integration tests. Between all of our tests, we catch the majority of our bugs before they hit production. However, this was a different type of bug, created by behavior a developer often wouldn't think of, thus a developer wasn't very likely to write a test that would catch this issue.

I proposed an idea I wasn't very fond of, the Curious Customer (CC): upon delivery of any feature you could ask another developer on the team to use the feature in the staging environment, acting as a user curiously toying with all aspects of the feature.

Over a year later, I'm not sure it's such a bad idea. In that year we've delivered several features, and (prior to release) I've found several bugs while playing the part of CC. I can't remember a single one of them that would have led to a notable problem in production; however all of them would have led to at least one support call, and possibly a bit less trust in our software.

My initial thought was: asking developers to context switch to QAing some software they didn't write couldn't possibly work, could it? Would they give it the necessary effort, or would they half-ass the task and get back to coding?

For fear of half-ass, thus wasted effort, I tried to define the CC's responsibilities very narrowly. CC was an option, not a requirement; if you delivered a feature you could request a CC, but you could also go to production without a CC. A CC was responsible for understanding the domain requirements, not the technical requirements. It's the developers responsibility to get the software to staging, the CC should be able to open staging and get straight to work. If the CC managed to crash or otherwise corrupt staging, it was the developers responsibility to get things back to a good state. The CC doesn't have an official form or process for providing feedback; The CC may chose email, chat, or any mechanism they prefer for providing feedback.

That's the idea, more or less. I've been surprised and pleased at the positive impact CC has had. It's not life changing, but it does reduce the number of support calls and the associated waste with tracking down largely benign bugs, at least, on our team.

You might ask how this differs from QA. At it's core, I'm not sure it does in any notable way. That said, I believe traditional QA differs in a few interesting ways. Traditional QA is often done by someone whose job is exclusively QA. With that in mind, I suppose we could follow the "devops" pattern and call this something like "devqa", but that doesn't exactly roll off the tongue. Traditional QA is also often a required task, every feature and/or build requires QA sign off. Finally, the better QA engineers I've worked with write automated tests that continually run to prevent regression; A CC may write a script or two for a single given task, but those scripts are not expected to be valuable to any other team member now or for anyone (including the author) at any point in the future.

Thursday, June 16, 2016

Maintainability and Expect Literals

Recently, Stephen Schaub asked the following on the wewut group:

Several of the unit test examples in the book verify the construction of both HTML and plain text strings. Jay recommends using literal strings in the assertions. However, this strikes me as not a particularly maintainable approach. If the requirements regarding the formatting of these strings changes (a very likely scenario), every single test that verifies one of these strings using a literal must be updated. Combined with the advice that each test should check only one thing, this leads to a large number of extremely brittle tests.

Am I missing something here? I can appreciate the reasons Jay recommends using literals in the tests. However, it seems that we pay a high maintainability price in exchange for the improved readability.

I responded to Stephen; however, I've seen similar questions asked a few times. Below are my extended thoughts regarding literals as expected values.

In general, given the option of having many similar strings (or any literal) vs a helper function, I would always prefer the literal. When a test is failing I only care about that single failing test. If I have to look at the helper function I no longer have the luxury of staying focused on the single test; now I need to consider what the helper function is giving me and what it's giving all other callers. Suddenly the scope of my work has shifted from one test to all of the tests coupled by this helper function. If this helper function wasn't written by me, this expansion in scope wasn't even my decision, it was forced upon me by the helper function creator. In the best case the helper function could return a single, constant string. The scope expansion becomes even worse when the helper function contains code branches.

As for alternatives, my solution would depend on the problem. If the strings were fairly consistent, I would likely simply duplicate everything knowing that any formatting changes can likely be addressed using a bulk edit via find and replace. If the strings were not consistent, I would look at breaking up the methods in a way that would allow me to verify the code branches using as little duplication as possible, e.g. if I wanted to test a string that dynamically changed based on a few variables, I would look to test those variables independently, and then only have a few tests for the formatting.

A concrete example will likely help here. Say I'm writing a trading system and I need to display messages such as

"paid 10 on 15 APPL. $7 Commission. spent: $157"

"paid 1 on 15 VTI. Commission free. spent: $15"

"sold 15 APPL at 20. $7 Commission. collected: $293"

"sold 15 VTI at 2. Commission free. collected: $30"

There's quite a bit of variation in those messages. You could have 1 function that creates the entire string:
confirmMsg(side, size, px, ticker)

However, I think you'd end up with quite a few verbose tests. Given this problem, I would look to break down those strings into smaller, more focused functions, for example:


describeOrder(side, size, px, ticker)

describeCommission(ticker)

describeTotal(side, size, px, ticker)

Now that you've broken down the function, you're free to test the code paths of the more focused functions, and the test for confirmMsg becomes trivial. Something along the lines of

assertEquals("paid 10 on 15 APPL",
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("sell 15 APPL at 10",
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))

assertEquals("$7 Commission", 
  describeCommission({tickerName:"APPL",commission:"standard"}))
assertEquals("Commission free", 
  describeCommission({tickerName:"APPL",commission:"free"}))

assertEquals("spent: $157", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("collected: $143", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"standard"}))
assertEquals("spent: $150", 
  describeOrder("buy", 10, 15, {tickerName:"APPL",commission:"free"}))
assertEquals("collected: $150", 
  describeOrder("sell", 10, 15, {tickerName:"APPL",commission:"free"}))

assertEquals("order. commission. total", 
  confirmMsg("order", "commission", "total"))

I guess I could summarize it by saying, I should be able to easily find and replace my expected literals. If I cannot, then I have an opportunity to further break down a method and write more focused tests on the newly introduced, more granular tests.

Wednesday, December 17, 2014

Working Effectively with Unit Tests Official Launch

Today marks the official release release of Working Effectively with Unit Tests. The book is available in various formats:

DRM free pdf, epub, & mobi (Kindle) at http://leanpub.com/wewut

Softcover at http://amzn.com/1503242706

Kindle edition at http://amzn.com/B00QS2HXUO

I’m very happy with the final version. Michael Feathers wrote a great foreword. I incorporated feedback from dozens of people - some that have been friends for years, and some that I’d never previously met. I can’t say enough great things about http://leanpub.com, and I highly recommend it for getting an idea out there and making it easy to get fast feedback.

As far as the softcover edition, I had offers from a few major publishers, but in the end none of them would allow me to continue to sell on leanpub at the same time. I strongly considered caving to the demands of the major publishers, but ultimately the ability to create a high quality softcover and make it available on Amazon was too tempting to pass up.

The feedback has been almost universally positive - the reviews are quite solid on goodreads (http://review.wewut.com). I believe the book provides specific, concise direction for effective Unit Testing, and I hope it helps increase the quality of the unit tests found in the wild.

If you'd like to try before you buy, there's a sample available in pdf format or on the web.

Wednesday, May 21, 2014

Working Effectively with Unit Tests

Unit Testing has moved from fringe to mainstream, which is a great thing. Unfortunately, as a side effect developers are creating mountains of unmaintainable tests. I've been fighting the maintenance battle pretty aggressively for years, and I've decided to write a book that captures what I believe is the most effective way to test.

From the Preface

Over a dozen years ago I read Refactoring for the first time; it immediately became my bible. While Refactoring isn’t about testing, it explicitly states: If you want to refactor, the essential precondition is having solid tests. At that time, if Refactoring deemed it necessary, I unquestionably complied. That was the beginning of my quest to create productive unit tests.

Throughout the 12+ years that followed reading Refactoring I made many mistakes, learned countless lessons, and developed a set of guidelines that I believe make unit testing a productive use of programmer time. This book provides a single place to examine those mistakes, pass on the lessons learned, and provide direction for those that want to test in a way that I’ve found to be the most productive.

The book does touch on some theory and definition, but the main purpose is to show you how to take tests that are causing you pain and turn them into tests that you're happy to work with.

For example, the book demonstrates how to go from...

looping test with many (built elsewhere) collaborators
.. to individual tests that expect literals, limit scope, explicitly define collaborators, and focus on readability
.. to fine-grained tests that focus on testing a single responsibility, are resistant to cascading failures, and provide no friction for those practicing ruthless Refactoring.

As of right now, you can read the first 2 chapters for free at https://leanpub.com/wewut/read

I'm currently ~25% done with the book, and it's available now for $14.99. My plan is to raise the price to $19.99 when I'm 50% done, and $24.99 when I'm 75% done. Leanpub offers my book with 100% Happiness Guarantee: Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks. Therefore, if you find the above or the free sample interesting, you might want to buy it now and save a few bucks.

Buy Now here: https://leanpub.com/wewut

Monday, May 19, 2014

Weighing in on Long Live Testing

DHH recently wrote a provocative piece that gave some views into how he does and doesn't test these days. While I don't think I agree with him completely, I applaud his willingness to speak out against TDD dogma. I've written publicly about not buying the pair-programming dogma, but I hadn't previously been brave enough to admit that I no longer TDD the vast majority of the time.

The truth is, I haven't been dogmatic about TDD in quite some time. Over 6 years ago I was on a ThoughtWorks project where I couldn't think of a single good reason to TDD the code I was working on. To be honest, there weren't really any reasons that motivated me to write tests at all. We were working on a fairly simple, internal application. They wanted software as fast as they could possibly get it, and didn't care if it crashed fairly often. We kept everything simple, manually tested new features through the UI, and kept our customer's very happy.

There were plenty of reasons that we could have written tests. Reasons that I expect people will want to yell at me right now. To me, that's actually the interesting, and missing part, of the latest debate on TDD. I don't see people asking: Why are we writing this test? Is TDD good or bad? That depends; TDD is just a tool, and often the individual is the determining factor when it comes to how effective a tool is. If we start asking "Why?", it's possible to see how TDD could be good for some people, and bad for DHH.

I've been quietly writing a book on Working Effectively with Unit Tests, and I'll have to admit that it was really, really hard not to jump into the conversation with some of the content I've recently written. Specifically, I think this paragraph from the Preface could go a long way to helping people understand an opposing argument.

Why Test?
The answer was easy for me: Refactoring told me to. Unfortunately, doing something strictly because someone or something told you to is possibly the worst approach you could take. The more time I invested in testing, the more I found myself returning to the question: Why am I writing this test?

There are many motivators for creating a test or several tests:

validating the system

immediate feedback that things work as expected
prevent future regressions

increase code-coverage
enable refactoring of legacy codebase
document the behavior of the system
your manager told you to
Test Driven Development

improved design
breaking a problem up into smaller pieces
defining the "simplest thing that could possibly work"

customer approval
ping pong pair-programming
Some of the above motivators are healthy in the right context, others are indicators of larger problems. Before writing any test, I would recommend deciding which of the above are motivating you to write a test. If you first understand why you're writing a test, you'll have a much better chance of writing a test that is maintainable and will make you more productive in the long run.

Once you start looking at tests while considering the motivator, you may find you have tests that aren't actually making you more productive. For example, you may have a test that increases code-coverage, but provides no other value. If your team requires 100% code-coverage, then the test provides value. However, if you team has abandoned the (in my opinion harmful) goal of 100% code-coverage, then you're in a position to perform my favorite refactoring: delete.

I don't actually know what motivates DHH to test, but if we assumed he cares about validating the system, preventing future regressions, and enabling refactoring (exclusively) then there truly is no reason to TDD. That doesn't mean you shouldn't; it just means, given what he values and how he works, TDD isn't valuable to him. Of course, conversely, if you value immediate feedback, problems in small pieces, and tests as clients that shape design, TDD is probably invaluable to you.

I find myself doing both. Different development activities often require different tools; i.e. Depending on what I'm doing, different motivators apply, and what tests I write change (hopefully) appropriately.

To be honest, if you look at your tests in the context of the motivators above, that's probably all you need to help you determine whether or not your tests are making you more or less effective. However, if you want more info on what I'm describing, you can pick up the earliest version of my upcoming book. (cheaply, with a full refund guarantee)

Tuesday, May 14, 2013

Clojure: Testing The Creation Of A Partial Function

I recently refactored some code that takes longs from two different sources to compute one value. The code originally stored the longs and called a function when all of the data arrived. The refactored version partials the data while it's incomplete and executes the partial'd function when all of the data is available. Below is a contrived example of what I'm taking about.

Let's pretend we need a function that will allow us to check whether or not another drink would make us legally drunk in New York City.

The code below stores the current bac and uses the value when legally-drunk? is called.

The following (passing) tests demonstrate that everything works as expected.

This code works without issue, but can also be refactored to store a partial'd function instead of the bac value. Why you would want to do such a thing is outside of the scope of this post, so we'll just assume this is a good refactoring. The code below no longer stores the bac value, and instead stores the pure-legally-drunk? function partial'd with the bac value.

Two of the three of the tests don't change; however, the test that was verifying the state is now broken.

note: The test output has been trimmed and reformatted to avoid horizontal scrolling.

In the output you can see that the test is failing as you'd expect, due to the change in what we're storing. What's broken is obvious, but there's not an obvious solution. Assuming you still want this state based test, how do you verify that you've partial'd the right function with the right value?

The solution is simple, but a bit tricky. As long as you don't find the redef too magical, the following solution allows you to easily verify the function that's being partial'd as well as the arguments.

Those tests all pass, and should provide security that the legally-drunk? and update-bac functions are sufficiently tested. The pure-legally-drunk? function still needs to be tested, but that should be easy since it's a pure function.

Would you want this kind of test? I think that becomes a matter of context and personal preference. Given the various paths through the code the following tests should provide complete coverage.

The above tests make no assumptions about the implementation - they actually pass whether you :use the 'original namespace or the 'refactored namespace. Conversely, the following tests verify each function in isolation and a few of them are very much tied to the implementation.

Both sets of tests would give me confidence that the code works as expected, so choosing which tests to use would become a matter of maintenance cost. I don't think there's anything special about these examples; I think they offer the traditional trade-offs between higher and lower level tests. A specific trade-off that stands out to me is identifying defect localization versus having to update the test when you update the code.

As I mentioned previously, the high-level-expectations work for both the 'original and the 'refactored namespaces. Being able to change the implementation without having to change the test is obviously an advantage of the high level tests. However, when things go wrong, the lower level tests provide better feedback for targeting the issue.

The following code is exactly the same as the code in refactored.clj, except it has a 1 character typo. (it's not necessary to spot the typo, the test output below will show you want it is)

The high level tests give us the following feedback.

failure in (high_level_expectations.clj:14) : expectations.high-level-expectations
(expect
 true
 (with-redefs
  [state (atom {})]
  (update-bac 0.01)
  (legally-drunk? 0.07)))

           expected: true 
                was: false

There's not much in that failure report to point us in the right direction. The unit-level-expectations provide significantly more information, and the details that should make it immediately obvious where the typo is.

failure in (unit_level_expectations.clj:8) : expectations.unit-level-expectations
(expect
 {:legally-drunk?* [pure-legally-drunk? 0.04]}
 (with-redefs [state (atom {}) partial vector] (update-bac 0.04)))

           expected: {:legally-drunk?* [# 0.04]} 
                was: {:legally-drunk?** [# 0.04]}
 
           :legally-drunk?** with val [# 0.04] 
                             is in actual, but not in expected
           :legally-drunk?* with val [# 0.04] 
                            is in expected, but not in actual

The above output points us directly to the extra asterisk in update-bac that caused the failure.

Still, I couldn't honestly tell you which of the above tests that I prefer. This specific example provides a situation where I think you could convincingly argue for either set of tests. However, as the code evolved I would likely choose one path or the other based on:

how much 'setup' is required for always using high-level tests?
how hard is it to guarantee integration using primarily unit-level tests?

In our examples the high level tests require redef'ing one bit of state. If that grew to a few pieces of state and/or a large increase in the complexity of the state, then I may be forced to move towards more unit-level tests. A rule of thumb I use: If a significant amount of the code within a test is setting up the test context, there's probably a smaller function and a set of associated tests waiting to be extracted.

By definition, the unit-level tests don't test the integration of the various functions. When I'm using unit-level tests, I'll often test the various code paths at the unit level and then have a happy-path high-level test that verifies integration of the various functions. My desire to have more high-level tests increases as the integration complexity increases, and at some point it makes sense to simply convert all of the tests to high-level tests.

If you constantly re-evaluate which tests will be more appropriate and switch when necessary, you'll definitely come out ahead in the long run.

Saturday, November 19, 2011

Clojure: expectations - scenarios

clojure-deprecating-expectationsscenari.html

When I set out to write expectations I wanted to create a simple unit testing framework. I'm happy with what expectations provides for unit testing; however, I also need to write the occasional test that changes values or causes a side effect. There's no way I could go back to clojure.test after enjoying better failure messages, trimmed stack traces, automatic testing running, etc. Thus, expectations.scenarios was born.

Using expectations.scenarios should be fairly natural if you already use expectations. The following example is a simple scenario (which could be a unit test, but we'll start here for simplicity).

(ns example.scenarios
  (:use expectations.scenarios))

(scenario
 (expect nil? nil))

A quick trip to the command line shows us that everything is working as expected.

Ran 1 tests containing 1 assertions in 4 msecs
0 failures, 0 errors.

As I said above, you could write this test as a unit test. However, expectations.scenarios was created for the cases in which you want to verify a value, make a change, and verify a value again. The following example shows multiple expectations verifying changing values in the same scenario.

(scenario
  (let [a (atom 0)]
    (swap! a inc)
    (expect 1 @a)
    (swap! a inc)
    (expect 2 @a)))

In expectations (unit tests) you can only have one expect (or given) so failures are captured, but they do not stop execution. However, due to the procedural nature of scenarios, the first failing expect stops execution.

(scenario
  (let [a (atom 0)]
    (swap! a inc)
    (expect 2 @a)
    (println "you'll never see this")))

failure in (scenarios.clj:4) : example.scenarios
           (expect 2 (clojure.core/deref a))
           expected: 2 
                was: 1
           on (scenarios.clj:7)
Ran 1 tests containing 1 assertions in 81 msecs
1 failures, 0 errors.

expectations.scenarios also allows you to easily verify calls to any function. I generally use interaction expects when I need to verify some type of side effect (e.g. logging or message publishing).

(scenario
 (println "1")
 (expect (interaction (println "1"))))

It's important to note the ordering of this scenario. You don't 'setup' an expectation and then call the function. Exactly the opposite is true - you call the function the same way you would in production, then you expect the interaction to have occurred. You may find this jarring if you're used to setting up your mocks ahead of time; However, I think this syntax is the least intrusive - and I think you'll prefer it in the long term.

The above example calls println directly, but your tests are much more likely to look something like this.

(defn foo [x] (println x))

(scenario
 (foo "1")
 (expect (interaction (println "1"))))

Similar to all other mocking frameworks (that I know of) the expect is using an implicit "once" argument. You can also specify :twice and :never if you find yourself needing those interaction tests.

(defn foo [x] (println x))

(scenario
 (foo "1")
 (foo "1")
 (expect (interaction (println "1")) :twice)
 (expect (interaction (identity 1)) :never))

On occasion you may find yourself interested in verifying 2 out of 3 arguments - expectations.scenarios provides the 'anything' var that can be used for arguments you don't care about.

(defn foo [x y z] (println x y z))

(scenario
 (foo "1" 2 :a)
 (expect (interaction (println "1" anything :a))))

That's about all there is to expectations.scenarios, hopefully it fills the gap for tests you want to write that simply can't be done as unit tests.

Tuesday, November 01, 2011

Clojure: expectations unit testing wrap-up

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six (this entry)

The previous blog posts on expectations unit testing syntax cover all of the various ways that expectations can be used to write tests and what you can expect when your tests fail. However, there are a few other things worth knowing about expectations.

Stacktraces
expectations aggressively removes lines from the stacktraces. Just like many other aspects of expectations, the focus is on more signal and less noise. Any line in the stacktrace from clojure.core, clojure.lang, clojure.main, and java.lang will be removed. As a result any line appearing in your stacktrace should be relevant to your application or a third-party lib you're using. expectations also removes any duplicates that can occasionally appear when anonymous functions are part of the stacktrace. Again, it's all about improving signal by removing noise. Speaking of noise...

Test Names
You might have noticed that expectations does not require you to create a test name. This is a reflection of my personal opinion that test names are nothing more than comments and shouldn't be required. If you desire test names, feel free to drop a comment above each test. Truthfully, this is probably a better solution anyway, since you can use spaces (instead of dashes) to separate words in a comment. Comments are good when used properly, but they can become noise when they are required. The decision to simply use comments for test names is another example of improving signal by removing noise.

Running Focused Expectations
Sometimes you'll have a file full of expectations, but you only want to run a specific expectation - expectations solves this problem by giving you 'expect-focused'. If you use expect-focused only expectations that are defined using expect-focused will be run.

For example, if you have the following expectations in a file you should see the following results from 'lein expectations'.

(ns sample.test.core
  (:use [expectations]))

(expect zero? 0)
(expect zero? 1)
(expect-focused nil? nil)

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 2 msecs
IGNORED 2 EXPECTATIONS
0 failures, 0 errors.

As you can see, expectations only ran one test - the expect-focused on line 6. If the other tests had been run the test on line 5 would have created a failure. It can be easy to accidentally leave a few expect-focused calls in, so expectations prints the number of ignored expectations in capital letters as a reminder. Focused expectation running is yet another way to remove noise while working through a problem.

Tests Running
If you always use 'lein expectations' to run your tests you'll never even care; however, if you ever want to run individual test files it's important to know that your tests run by default on JVM shutdown. When I'm working with Clojure and Java I usually end up using IntelliJ, and therefore have the ability to easily run individual files. When I switched from clojure.test to expectations I wanted to make test running as simple as possible - so I removed the need to specify (run-all-tests). Of course, if you don't want expectations to run for some reason you can disable this feature by calling (expectations/disable-run-on-shutdown).

JUnit Integration
Lack of JUnit integration was a deal breaker for my team in the early days, so expectations comes with an easy way to run all tests as part of JUnit. If you want all of your tests to run in JUnit all you need to do is implement ExpectationsTestRunner.TestSource. The following example is what I use to run all the tests in expectations with JUnit.

import expectations.junit.ExpectationsTestRunner;
import org.junit.runner.RunWith;

@RunWith(expectations.junit.ExpectationsTestRunner.class)
public class SuccessTest implements ExpectationsTestRunner.TestSource{

    public String testPath() {
        return "test/clojure/success";
    }
}

As you can see from the example above, all you need to do is tell the test runner where to find your Clojure files.

That should be everything you need to know about expectations for unit testing use. If anything is unclear, please drop me a line in the comments.

Clojure: expectations - removing duplication with given

create some state
verify a bit about the state
alter the state
verify more about the state

Part of the problem is that the assertions that occurred before the altering of the state may or may not be relevant after the alteration. Additionally, if any of the assertions fail you have to stop running the entire test - thus some of your assertions will not be run (and you'll be lacking some information).

expectations takes an alternate route - embracing the idea of multiple assertions by providing a specific syntax that allows multiple verifications and the least amount of duplication.

The following example shows how you can test multiple properties of a Java object using the 'given' syntax.

(given (java.util.ArrayList.)
       (expect
         .size 0
         .isEmpty true))

jfields$ lein expectations
Ran 2 tests containing 2 assertions in 4 msecs
0 failures, 0 errors.

The syntax is simple enough: (given an-object (expect method return-value [method return-value]))
note: [method return-value] may be repeated any number of times.

This syntax allows us to expect return-values from as many methods as we care to verify, but encourages us not to change any state between our various assertions. This syntax also allows us to to run each assertion regardless of the outcome of any previous assertion.

Obviously you could call methods that change the internal state of the object and at that point you're on your own. I definitely wouldn't recommend testing that way. However, as long as you call methods that don't change any state 'given' can help you write succinct tests that verify as many aspects of an object as you need to test.

As usual, I'll show the output for tests that fail using this syntax.

(given (java.util.ArrayList.)
       (expect
         .size 1
         .isEmpty false))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
           (expect 1 (.size (java.util.ArrayList.)))
           expected: 1 
                was: 0
failure in (core.clj:4) : sample.test.core
           (expect false (.isEmpty (java.util.ArrayList.)))
           expected: false 
                was: true

This specific syntax was created for testing Java objects, but an interesting side effect is that it actually works on any value and you can substitute method calls with any function. For example, you can test a vector or a map using the examples below as a template.

(given [1 2 3]
       (expect
         first 1
         last 3))

(given {:a 2 :b 4}
       (expect 
         :a 2
         :b 4))

jfields$ lein expectations
Ran 4 tests containing 4 assertions in 8 msecs
0 failures, 0 errors.

And, of course, the failures.

(given [1 2 3]
       (expect
         first 2
         last 1))

(given {:a 2 :b 4}
       (expect 
         :a 1
         :b 1))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
           (expect 2 (first [1 2 3]))
           expected: 2 
                was: 1
failure in (core.clj:4) : sample.test.core
           (expect 1 (last [1 2 3]))
           expected: 1 
                was: 3
failure in (core.clj:9) : sample.test.core
           (expect 1 (:a {:a 2, :b 4}))
           expected: 1 
                was: 2
failure in (core.clj:9) : sample.test.core
           (expect 1 (:b {:a 2, :b 4}))
           expected: 1 
                was: 4
Ran 4 tests containing 4 assertions in 14 msecs
4 failures, 0 errors.

When you want to call methods on a Java object or call functions with the same instance over and over the previous given syntax is really the simplest solution. However, there are times where you want something more flexible.

expectations also has a 'given' syntax that allows you to specify a template - thus reducing code duplication. The following example shows a test that verifies + with various arguments.

(given [x y] (expect 10 (+ x y))
       4 6
       6 4
       12 -2)

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 5 msecs
0 failures, 0 errors.

The syntax for this flavor of given is: (given bindings template-form values-to-be-bound). The template form can be anything you need - just remember to put the expect in there.

Here's another example where we combine given with in to test a few different things. This example shows both the successful and failing versions.

;; successful
(given [x y] (expect x (in y))
       :a #{:a :b}
       {:a :b} {:a :b :c :d})

;; failure
(given [x y] (expect x (in y))
       :c #{:a :b}
       {:a :d} {:a :b :c :d})

lein expectations
failure in (core.clj:8) : sample.test.core
           (expect :c (in #{:a :b}))
           key :c not found in #{:a :b}
failure in (core.clj:8) : sample.test.core
           (expect {:a :d} (in {:a :b, :c :d}))
           expected: {:a :d} 
                 in: {:a :b, :c :d}
           :a expected: :d
                   was: :b
Ran 4 tests containing 4 assertions in 13 msecs
2 failures, 0 errors.

That's basically it for 'given' syntax within expectations. There are times that I use all of the various versions of given; however, there seems to be a connection with using given and interacting with Java objects. If you don't find yourself using Java objects very often then you probably wont have a strong need for given.

Clojure: expectations and Double/NaN


;; allow Double/NaN equality in a map
(expect {:a Double/NaN :b {:c Double/NaN}} {:a Double/NaN :b {:c Double/NaN}})

;; allow Double/NaN equality in a set
(expect #{1 Double/NaN} #{1 Double/NaN})

;; allow Double/NaN equality in a list
(expect [1 Double/NaN] [1 Double/NaN])

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 32 msecs
0 failures, 0 errors.

As you would expect, you can also count on Double/NaN being considered equal even if you are using the 'in' function.

;; allow Double/NaN equality when verifying values are in a map
(expect {:a Double/NaN :b {:c Double/NaN}} (in {:a Double/NaN :b {:c Double/NaN} :d "other stuff"}))

;; allow Double/NaN equality when verifying it is in a set
(expect Double/NaN (in #{1 Double/NaN}))

;; allow Double/NaN equality when verifying it's existence in a list
(expect Double/NaN (in [1 Double/NaN]))

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 32 msecs
0 failures, 0 errors.

For completeness I'll also show the examples of each of these examples failing.

;; allow Double/NaN equality in a map
(expect {:a Double/NaN :b {:c Double/NaN}} {:a nil :b {:c Double/NaN}})

;; allow Double/NaN equality with in fn and map
(expect {:a Double/NaN :b {:c nil}} (in {:a Double/NaN :b {:c Double/NaN} :d "other stuff"}))

;; allow Double/NaN equality in a set
(expect #{1 Double/NaN} #{1 nil})

;; allow Double/NaN equality with in fn and set
(expect Double/NaN (in #{1 nil}))

;; allow Double/NaN equality in a list
(expect [1 Double/NaN] [1 nil])

;; allow Double/NaN equality with in fn and list
(expect Double/NaN (in [1 nil]))


jfields$ lein expectations
failure in (core.clj:5) : sample.test.core
           (expect {:a Double/NaN, :b {:c Double/NaN}}
                   {:a nil, :b {:c Double/NaN}})
           expected: {:a NaN, :b {:c NaN}} 
                was: {:a nil, :b {:c NaN}}
           :a expected: NaN
                   was: nil
failure in (core.clj:8) : sample.test.core
           (expect {:a Double/NaN, :b {:c nil}} (in {:a Double/NaN, :b {:c Double/NaN}, :d "other stuff"}))
           expected: {:a NaN, :b {:c nil}} 
                 in: {:a NaN, :b {:c NaN}, :d "other stuff"}
           :b {:c expected: nil
                       was: NaN
failure in (core.clj:11) : sample.test.core
           (expect #{1 Double/NaN} #{nil 1})
           expected: #{NaN 1} 
                was: #{nil 1}
           nil are in actual, but not in expected
           NaN are in expected, but not in actual
failure in (core.clj:14) : sample.test.core
           (expect Double/NaN (in #{nil 1}))
           key NaN not found in #{nil 1}
failure in (core.clj:17) : sample.test.core
           (expect [1 Double/NaN] [1 nil])
           expected: [1 NaN] 
                was: [1 nil]
           nil are in actual, but not in expected
           NaN are in expected, but not in actual
failure in (core.clj:20) : sample.test.core
           (expect Double/NaN (in [1 nil]))
           value NaN not found in [1 nil]
Ran 6 tests containing 6 assertions in 66 msecs
6 failures, 0 errors.

There always seems to be downsides to using NaN, so I tend to look for the least painful path. Hopefully expectations provides the most pain-free path when your tests end up needing to include NaN.

Clojure: expectations with values in vectors, sets, and maps

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three (this entry)
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

I've previously written about verifying equality and the various non-equality expectations that are available. This entry will focus on another type of comparison that is allowed in expectations - verifying that an 'expected' value is in an 'actual' value.

A quick recap - expectations generally look like this: (expect expected actual)

verifying an expected value is in an actual value is straightforward and hopefully not a surprising syntax: (expect expected (in actual))

If that's not clear, these examples should make the concept completely clear.

;; expect a k/v pair in a map.
(expect {:foo 1} (in {:foo 1 :cat 4}))

;; expect a key in a set
(expect :foo (in #{:foo :bar}))

;; expect a val in a list
(expect :foo (in [:foo :bar]))

As you would expect, running these expectations results in 3 passing tests.

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 8 msecs
0 failures, 0 errors.

As usual, I'll show the failures as well.

;; expect a k/v pair in a map.
(expect {:foo 2} (in {:foo 1 :cat 4}))

;; expect a key in a set
(expect :baz (in #{:foo :bar}))

;; expect a val in a list
(expect :baz (in [:foo :bar]))

jfields$ lein expectations
failure in (core.clj:18) : sample.test.core
           (expect {:foo 2} (in {:foo 1, :cat 4}))
           expected: {:foo 2} 
                 in: {:foo 1, :cat 4}
           :foo expected: 2
                     was: 1
failure in (core.clj:21) : sample.test.core
           (expect :baz (in #{:foo :bar}))
           key :baz not found in #{:foo :bar}
failure in (core.clj:24) : sample.test.core
           (expect :baz (in [:foo :bar]))
           value :baz not found in [:foo :bar]

expectations does it's best to provide you with any additional info that might be helpful. In the case of the vector and the set there's not much else that can be said; however, the map failure gives you additional information that can be used to track down the issue.

There's nothing magical going on with 'in' expectations and you could easily do the equivalent with select-keys, contains?, or some, but expectations allows you to get that behavior while keeping your tests succinct.

Clojure: Non-equality expectations

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two (this entry)
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

In my last blog post I gave examples of how to use expectations to test for equality. This entry will focus on non-equality expectations that are also available.

Regex
expectations allows you to specify that you expect a regex, and if the string matches that regex the expectation passes. The following example shows both the successful and failing expectations that use regexes.


(expect #"in 14" "in 1400 and 92")

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 4 msecs
0 failures, 0 errors.

(expect #"in 14" "in 1300 and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
           (expect in 14 in 1300 and 92)
           regex #"in 14" not found in "in 1300 and 92"
Ran 1 tests containing 1 assertions in 5 msecs
1 failures, 0 errors.

As you can see from the previous example, writing an expectation using a regex is syntactically the same as writing an equality expectation - and this is true for all of the non-equality expectations. In expectations there is only one syntax for expect - it's always (expect expected actual).

Testing for a certain type
I basically never write tests that verify the result of a function is a certain type. However, for the once in a blue moon case where that's what I need, expectations allows me to verify that the result of a function call is a certain type simply by using that type as the expected value. The example below shows the successful and failing examples of testing that the actual is an instance of the expected type.

(expect String "in 1300 and 92")

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 6 msecs
0 failures, 0 errors.

(expect Integer "in 1300 and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
           (expect Integer in 1300 and 92)
           in 1300 and 92 is not an instance of class java.lang.Integer
Ran 1 tests containing 1 assertions in 5 msecs
1 failures, 0 errors.

Expected Exceptions
Expected exceptions are another test that I rarely write; however, when I find myself in need - expectations has me covered.

(expect ArithmeticException (/ 12 0))

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 6 msecs
0 failures, 0 errors.

(expect ClassCastException (/12 0))

jfields$ lein expectations
failure in (core.clj:19) : sample.test.core
           (expect ClassCastException (/ 12 0))
           (/ 12 0) did not throw ClassCastException
Ran 1 tests containing 1 assertions in 4 msecs
1 failures, 0 errors.

There's another non-equality expectation that I do use fairly often - an expectation where the 'expected' value is a function. The following simple examples demonstrate that if you pass a function as the first argument to expect it will be called with the 'actual' value and it will pass or fail based on what the function returns. (truthy results pass, falsey results fail).

(expect nil? nil)
(expect true? true)
(expect false? true)

jfields$ lein expectations
failure in (core.clj:19) : sample.test.core
           (expect false? true)
           true is not false?
Ran 3 tests containing 3 assertions in 4 msecs
1 failures, 0 errors.

These are the majority of the non-equality expectations; however, there is one remaining non-equality expectation - in. Using 'in' is fairly straightforward, but since it has examples for vectors, sets, and maps I felt it deserved it's own blog post - coming soon.

Clojure: expectations Introduction

Clojure Unit Testing with Expectations Part One (this entry)
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

A bit of history
Over a year ago I blogged that I'd written a testing framework for Clojure - expectations. I wrote expectations to test my production code, but made it open source in case anyone else wanted to give it a shot. I've put zero effort into advertising expectations; however, I've been quietly adding features and expanding it's use on my own projects. At this point it's been stable for quite awhile, and I think it's worth looking at if you're currently using clojure.test.

Getting expectations
Setting up expectations is easy if you use lein. In your project you'll want to add:

:dev-dependencies [[lein-expectations "0.0.1"]
                   [expectations "1.1.0"]]

After adding both dependencies you can do a "lein deps" and then do a "lein expectations" and you should see the following output.

Ran 0 tests containing 0 assertions in 0 msecs
0 failures, 0 errors.

At this point, you're ready to start writing your tests using expectations.

Unit Testing using expectations
expectations is built with the idea that unit tests should contain one assertion per test. A result of this design choice is that expectations has very minimal syntax.

For example, if you want to verify the result of a function call, all you need to do is specify what return value you expect from the function call.

(expect 2 (+ 1 1)

note: you'll want to (:use expectations); however, no other setup is required for using expectations. I created a sample project for this blog post and the entire test file looks like this (at this point):

(ns sample.test.core
  (:use [expectations]))

(expect 2 (+ 1 1))

Again, we use lein to run our expectations.

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 2 msecs
0 failures, 0 errors.

That's the simplest, and most often used expectation - an equality comparison. The equality comparison works across all Clojure types - vectors, sets, maps, etc and any Java instances that return true when given to Clojure's = function.

(ns sample.test.core
  (:use [expectations]))

(expect 2 (+ 1 1))
(expect [1 2] (conj [] 1 2))
(expect #{1 2} (conj #{} 1 2))
(expect {1 2} (assoc {} 1 2))

Running the previous expectations produces similar output as before.

jfields$ lein expectations
Ran 4 tests containing 4 assertions in 26 msecs
0 failures, 0 errors.

Successful equality comparison isn't very exciting; however, expectations really begins to prove it's worth with it's failure messages. When comparing two numbers there's not much additional information that expectations can provide. Therefore, the following output is what you would expect when your expectation fails.

(expect 2 (+ 1 3))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
           (expect 2 (+ 1 3))
           expected: 2 
                was: 4

expectations gives you the namespace, file name, and line number along with the expectation you specified, the expected value, and the actual value. Again, nothing surprising. However, when you compare vectors, sets, and maps expectations does a bit of additional work to give you clues on what the problem might be.

The following 3 expectations using vectors will all fail, and expectations provides detailed information on what exactly failed.

(expect [1 2] (conj [] 1))
(expect [1 2] (conj [] 2 1))
(expect [1 2] (conj [] 1 3))

jfields$ lein expectations
failure in (core.clj:5) : sample.test.core
           (expect [1 2] (conj [] 1))
           expected: [1 2] 
                was: [1]
           2 are in expected, but not in actual
           expected is larger than actual
failure in (core.clj:6) : sample.test.core
           (expect [1 2] (conj [] 2 1))
           expected: [1 2] 
                was: [2 1]
           lists appears to contain the same items with different ordering
failure in (core.clj:7) : sample.test.core
           (expect [1 2] (conj [] 1 3))
           expected: [1 2] 
                was: [1 3]
           3 are in actual, but not in expected
           2 are in expected, but not in actual
Ran 3 tests containing 3 assertions in 22 msecs
3 failures, 0 errors.

In these simple examples it's easy to see what the issue is; however, when working with larger lists expectations can save you a lot of time by telling you which specific elements in the list are causing the equality to fail.

Failure reporting on sets looks very similar:

(expect #{1 2} (conj #{} 1))
(expect #{1 2} (conj #{} 1 3))

jfields$ lein expectations
failure in (core.clj:9) : sample.test.core
           (expect #{1 2} (conj #{} 1))
           expected: #{1 2} 
                was: #{1}
           2 are in expected, but not in actual
failure in (core.clj:10) : sample.test.core
           (expect #{1 2} (conj #{} 1 3))
           expected: #{1 2} 
                was: #{1 3}
           3 are in actual, but not in expected
           2 are in expected, but not in actual
Ran 2 tests containing 2 assertions in 15 msecs
2 failures, 0 errors.

expectations does this type of detailed failure reporting for maps as well, and this might be one if the biggest advantages expectations has over clojure.test - especially when dealing with nested maps.

(expect {:one 1 :many {:two 2}}
        (assoc {} :one 2 :many {:three 3}))

jfields$ lein expectations
failure in (core.clj:13) : sample.test.core
           (expect {:one 1, :many {:two 2}} (assoc {} :one 2 :many {:three 3}))
           expected: {:one 1, :many {:two 2}} 
                was: {:many {:three 3}, :one 2}
           :many {:three with val 3 is in actual, but not in expected
           :many {:two with val 2 is in expected, but not in actual
           :one expected: 1
                     was: 2
Ran 1 tests containing 1 assertions in 19 msecs
1 failures, 0 errors.

expectations also provides a bit of additional help when comparing the equality of strings.

(expect "in 1400 and 92" "in 14OO and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
           (expect in 1400 and 92 in 14OO and 92)
           expected: "in 1400 and 92" 
                was: "in 14OO and 92" 
            matches: "in 14" 
           diverges: "00 and 92" 
                  &: "OO and 92"
Ran 1 tests containing 1 assertions in 8 msecs
1 failures, 0 errors.

That's basically all you'll need to know for using expectations to equality test. I'll be following up this blog post with more examples of using expectations with regexs, expected exceptions and type checking; however, if you don't want to wait you can take a quick look at the success tests that are found within the framework.

Monday, August 01, 2011

Clojure: memfn

The other day I stumbled upon Clojure's memfn macro.

The memfn macro expands into code that creates a fn that expects to be passed an object and any args and calls the named instance method on the object passing the args. Use when you want to treat a Java method as a first-class fn.
(map (memfn charAt i) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)
-- clojure.org

At first glance it appeared to be something nice, but even the documentation states that "...it is almost always preferable to do this directly now..." - with an anonymous function.

(map #(.charAt %1 %2) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)

-- clojure.org, again

I pondered memfn. If it's almost always preferable to use an anonymous function, when is it preferable to use memfn? Nothing came to mind, so I moved on and never really gave memfn another thought.

Then the day came where I needed to test some Clojure code that called some very ugly and complex Java.

In production we have an object that is created in Java and passed directly to Clojure. Interacting with this object is easy (in production); however, creating an instance of that class (while testing) is an entirely different task. My interaction with the instance is minimal, only one method call, but it's an important method call. It needs to work perfectly today and every day forward.

I tried to construct the object myself. I wanted to test my interaction with this object from Clojure, but creating an instance turned out to be quite a significant task. After failing to easily create an instance after 15 minutes I decided to see if memfn could provide a solution. I'd never actually used memfn, but the documentation seemed promising.

In order to verify the behavior I was looking for, all I'll I needed was a function that I could rebind to return an expected value. The memfn macro provided exactly what I needed.

As a (contrived) example, let's assume you want to create a new order with a sequence id generated by incrementAndGet on AtomicLong. In production you'll use an actual AtomicLong and you might see something like the example below.

(def sequence-generator (AtomicLong.))
(defn new-order []
  (hash-map :id (.incrementAndGet sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}

While that might be exactly what you need in production, it's generally preferable to use something more explicit while testing. I haven't found an easy way to rebind a Java method (.incrementAndGet in our example); however, if I use memfn I can create a first-class function that is easily rebound.

(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
  (hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}

At this point we can see that memfn is calling our AtomicLong and our results haven't been altered in anyway. The final example shows a version that uses binding to ensure that inc&get always returns 10.

(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
  (hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => 1
(println (new-order)) ; => 2
(binding [inc&get (fn [_] 10)]
  (println (new-order)) ; => 10
  (println (new-order))) ; => 10

With inc&get being constant, we can now easily test our new-order function.

Tuesday, July 12, 2011

Undervalued Start and Restart Related Questions

How long does it take to start or restart your application?

Start-up time tends to be a concern that's often overlooked by programmers who write unit tests. It will (likely) always be faster to run a few unit tests than start an application; however, having unit tests shouldn't take the place of actually firing up the application and spot checking with a bit of clicking around. Both efforts are good; however, I believe the combination of both efforts is a case where the sum is greater than the parts.

My current team made start-up time a priority. Currently we are able to launch our entire stack (currently 6 processes) and start using the software within 10 seconds. Ten seconds is fast, but I have been annoyed with it at times. I'll probably try to cut it down to 5 seconds at some point in the near future, depending on the level of effort needed to achieve a sub-5-second start-up.

That effort is really the largest blocker for most teams. The problem is, often it's not clear what's causing start up to take so long. Performance tuning start-up isn't exactly sexy work. However, if you start your app often, the investment can quickly pay dividends. For my team, we found the largest wins by caching remote data on our local boxes and deferring creating complex models while running on development machines. Those two simple tweaks turn a 1.5 minute start-up time into 10 seconds.

If your long start-up isn't bothering you because you don't do it very often, I'll have to re-emphasize that you are probably missing out on some valuable feedback.

Not time related, but start related: Does your application encounter data-loss if it's restarted?

In the past I've worked on teams where frequent daily roll-outs were common. There are two types of these teams I've encountered. Some teams do several same day roll-outs to get new features into production as fast as possible. Other teams end up doing multiple intraday rollouts to fix newly found bugs in production. Regardless of the driving force, I've found that those teams can stop and start their servers quickly and without any information loss.

My current team has software stable enough that we almost never roll out intraday due to a bug. We also have uptime demands that mean new features are almost never more valuable than not stopping the software intraday. I can only remember doing 2 intraday restarts across 30 processes since February.

There's nothing wrong with our situation; however, we don't optimize for intraday restarts. As part of not prioritizing intraday restart related tasks, we've never addressed a bit of data-loss that occurs on a restart. It's traditionally been believed that the data wasn't very important (nice-to-have, if you will). However, the other day I wanted to rollout a new feature in the morning - before our "day" began. One of our customers stopped me from rolling out the software because he didn't want to lose the (previously believed nice-to-have) overnight data.

That was the moment that drove home the fact that even in our circumstances we needed to be able to roll out new software as seamlessly as possible. Even if mid-day rollouts are rare, any problems that a mid-day rollout creates will make it less likely that you can do a mid-day rollout when that rare moment occurs.

Tests and daily rollouts are nice, but if your team is looking to move from good to great I would recommend a non-zero amount of actual application usage from the user's point of view and fixing any issues that are road-blocks to multiple intraday rollouts.

Thursday, September 30, 2010

Clojure: Another Testing Framework - Expectations

Once upon a time I wrote Expectations for Ruby. I wanted a simple testing framework that allowed me to specify my test with the least amount of code.

Now that I'm spending the majority of my time in Clojure, I decided to create a version of Expectations for Clojure.

At first it started as a learning project, but I kept adding productivity enhancements. Pretty soon, it became annoying when I wasn't using Expectations. Obviously, if you write your own framework you are going to prefer to use it. However, I think the productivity enhancements might be enough for other people to use it as well.

So why would you want to use it?

Tests run automatically. Clojure hates side effects, yeah, I hear you. But, I hate wasting time and repeating code. As a result, Expectations runs all the tests on JVM shutdown. This allows you to execute a single file to run all the tests in that file, without having to specify anything additional. There's also a hook you can call if you don't want the tests to automatically run. (If you are looking for an example, there's a JUnit runner that disables running tests on shutdown)

What to test is inferred from your "expected" value. An equality test is probably the most common test written. In Expectations, an equality test looks like the following example.

(expect 3 (+ 1 2))

That's simple enough, but what if you want to match a regex against a string? The following example does exactly that, and it uses the same syntax.

(expect #"foo" "afoobar")

Other common tests are verifying an exception is thrown or checking the type of an actual value. The following snippets test those two conditions.

(expect ArithmeticException (/ 12 0))

(expect String "foo")

Testing subsets of the actual value. Sometimes you want an exact match, but there are often times when you only care about a subset of the actual value. For example, you may want to test all the elements of a map except the time and id pairs (presumably because they are dynamic). The following tests show how you can verify that some key/value pairs are in a map, a element is in a set, or an element is in a list.

;; k/v pair in map. matches subset
(expect {:foo 1} (in {:foo 1 :cat 4}))

;; key in set
(expect :foo (in (conj #{:foo :bar} :cat)))

;; val in list
(expect :foo (in (conj [:bar] :foo)))

Double/NaN is annoying. (not= Double/NaN Double/NaN) ;=> true. I get it, conceptually. In practice, I don't want my tests failing because I can't compare two maps that happen to have Double/NaN as the value for a matching key. In fact, 100% of the time I want (= Double/NaN Double/NaN) ;=> true. And, yes, I can rewrite the test and use Double/isNaN. I can. But, I don't want to. Expectations allows me to pretend (= Double/NaN Double/NaN) ;=> true. It might hurt me in the future. I'll let you know. For now, I prefer to write concise tests that behave as "expected".

Try rewriting this and using Double/isNaN (it's not fun)

(expect
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}}
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

Concise Java Object testing. Inevitably, I seem to end up with a few Java objects. I could write a bunch of different expect statements, but I opted for a syntax that allows me to check everything at once.

(given (java.util.ArrayList.)
       (expect
        .size 0
        .isEmpty true))

Trimmed Stacktraces. I'm sure it's helpful to look through Clojure and Java's classes at times. However, I find the vast majority of the time the problem is in my code. Expectations trims many of the common classes that are from Clojure and Java, leaving much more signal than noise. Below is the stacktrace reported when running the failure examples from the Expectations codebase.

failure in (failure_examples.clj:8) : failure.failure-examples
      raw: (expect 1 (one))
  act-msg: exception in actual: (one)
    threw: class java.lang.ArithmeticException-Divide by zero
           failure.failure_examples$two__375 (failure_examples.clj:4)
           failure.failure_examples$one__378 (failure_examples.clj:5)
           failure.failure_examples$G__381__382$fn__387 (failure_examples.clj:8)
           failure.failure_examples$G__381__382 (failure_examples.clj:8)

Every stacktrace line is from my code, where the problem lives.

Descriptive Error Messages. Expectations does it's best to give you all the important information when a failure does occur. The following failure shows what keys are missing from actual and expected as well is which values do not match.

running

   (expect
     {:z 1 :a 9          :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}
     {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

generates

   failure in (failure_examples.clj:110) : failure.failure-examples
         raw: (expect {:z 1, :a 9, :b {:c Double/NaN, :d 1, :e 2, :f {:g 10, :i 22}}} {:x 1, :a Double/NaN, :b {:c Double/NaN, :d 2, :e 4, :f {:g 11, :h 12}}})
      result: {:z 1, :a 9, :b {:c NaN, :d 1, :e 2, :f {:g 10, :i 22}}} are not in {:x 1, :a NaN, :b {:c NaN, :d 2, :e 4, :f {:g 11, :h 12}}}
     exp-msg: :x is in actual, but not in expected
              :b {:f {:h is in actual, but not in expected
     act-msg: :z is in expected, but not in actual
              :b {:f {:i is in expected, but not in actual
     message: :b {:e expected 2 but was 4
              :b {:d expected 1 but was 2
              :b {:f {:g expected 10 but was 11
              :a expected 9 but was NaN

note: I know it's a bit hard to read, but I wanted to cover all the possible errors with one example. In practice you'll get a few messages that will tell you exactly what is wrong.

For example, the error tells you

:b {:f {:g expected 10 but was 11

With that data it's pretty easy to see the problem in

(expect
  {:z 1 :a 9          :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

Expectations also tells you, when comparing two lists:

if the lists are the same, but differ only in order

if the lists are the same, but one list has duplicates

if the lists are not the same, which list is larger

JUnit integration. My project uses both Java and Clojure. I like running my tests in IntelliJ and I like TeamCity running my tests as part of the build. To accomplish this using Expectations all you need to do is create a java class similar to the example below.

import expectations.junit.ExpectationsTestRunner;
import org.junit.runner.RunWith;

@RunWith(expectations.junit.ExpectationsTestRunner.class)
public class FailureTest implements ExpectationsTestRunner.TestSource{

    public String testPath() {
        return "/path/to/the/root/folder/holding/your/tests";
    }
}

The Expectations Test Runner runs your Clojure tests in the same way that the Java tests run, including the green/red status icons and clickable links when things fail.

Why wouldn't you use Expectations?

Support. I'm using it to test my production code, but if I find errors I have to go fix them. You'll be in the same situation. I'll be happy to fix any bugs you find, but I might not have the time to get to it as soon as you send me email.

If you're willing to live on the bleeding edge, feel free to give it a shot.

Wednesday, August 11, 2010

clojure.test Introduction

I'll admit it, the first thing I like to do when learning a new language is fire up a REPL. However, I'm usually ready for the next step after typing in a few numbers, strings and defining a function or two.

What feels like centuries ago, Mike Clark wrote an article about using unit testing to learn a new language. Mike was ahead of his time. This blog entry should help you if you want to follow Mike's advice.

Luckily, Clojure has built in support for simple testing. (I'm currently using Clojure 1.2, you can download it from clojure.org)

Before we get started, let's make sure everything is working. Save a file with the following clojure in it and run* it with clojure.

(ns clojure.test.example
  (:use clojure.test))

(run-all-tests)

If everything is okay, you should see something similar the following output.

Testing clojure.walk

Testing clojure.core

(a bunch of other namespaces tested)

Testing clojure.zip

Ran 0 tests containing 0 assertions.
0 failures, 0 errors.

If you've gotten this far, you are all set to start writing your own tests. If you are having any trouble, I suggest logging into the #clojure IRC chat room on Freenode.net

The syntax for defining tests is very simple. The following test verifies that 1 + 1 = 2. You'll want to add the test after the ns definition and before the (run-all-tests) in the file you just created.

(deftest add-1-to-1
  (is (= 2 (+ 1 1))))

Running the test should produce something similar to the following output.

Testing clojure.walk

Testing clojure.test.example

(a bunch of other namespaces tested)

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

We see all of the standard clojure namespaces; however, we see our namespace (clojure.test.example) in the results as well. The output at the bottom also tells us that 1 test with 1 assertion was executed.

The following example shows testing a custom add function. (we will add additional tests from here, without ever deleting the old tests)

(defn add [x y] (+ x y))

(deftest add-x-to-y
  (is (= 5 (add 2 3))))

If everything goes to plan, running your tests should now produce the following text towards the bottom of the output.

Ran 2 tests containing 2 assertions.
0 failures, 0 errors.

At this point you might want to pass in a few different numbers to verify that add works as expected.

(deftest add-x-to-y-a-few-times
  (is (= 5 (add 2 3)))
  (is (= 5 (add 1 4)))
  (is (= 5 (add 3 2))))

Running the tests shows us our status

Ran 3 tests containing 5 assertions.
0 failures, 0 errors.

This works perfectly fine; however, clojure.test also provides are for verifying several values.

The following example tests the same conditions using are.

(deftest add-x-to-y-a-using-are
  (are [x y] (= 5 (add x y))
       2 3
       1 4
       3 2))

And, the unsurprising results.

Ran 4 tests containing 8 assertions.

That's a simple are; however, you can do whatever you need in the form. Let's grab the value out of a map as an additional example.

(deftest grab-map-values-using-are
  (are [y z] (= y (:x z))
       2 {:x 2}
       1 {:x 1}
       3 {:x 3 :y 4}))

Leaving us with

Ran 5 tests containing 11 assertions.

The is and are macros will be all that you need for 90% of all the tests you'll ever want to write. For additional assertions and more details you can check out the clojure.test documentation.

Advanced Topics (very unnecessary to get started)

I get annoyed with noise in my test results. Our results have been very noisy due to the namespace reporting. The run-all-tests function takes a regular expression (documented here). We can change our test running call to include a regular expression, as the following example shows.

(run-all-tests #"clojure.test.example")

Once we switch to providing a regular expression the results should be limited to the following output.


Testing clojure.test.example

Ran 5 tests containing 11 assertions.
0 failures, 0 errors.

This approach works fine for our current sample file; however, it seems like a better solution would be to stop reporting namespaces that do not contain any tests. The following snippet changes the report multimethod to ignore namespaces that don't contain any tests.

(defmethod report :begin-test-ns [m]
    (with-test-out
      (when (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

If you're just getting started, don't worry you don't need to understand what's going on in that snippet. I've copied the original report method and made it conditional by adding the code in bold. As a result, the namespace is only printed if it contains any tests.

Now that our results are clean, let's talk about ways of getting those results.

Adding calls to the run-all-tests function isn't a big deal when working with one namespace; however, you'll need to get clever when you want to run a suite of tests. I've been told that leiningen and Maven have tasks that allow you to run all the tests. You might want to start there. I don't currently use either one, and I'm lazy. I don't want to set up either, especially since all I want to do is run all my tests.

It turns out it's very easy to add a shutdown hook in Java. So, as a simple solution, I run all my tests from the Java shutdown hook.

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

In general, I create a test_helper.clj with the following code.

(ns test-helper
  (:use clojure.test))

(defmethod report :begin-test-ns [m]
    (with-test-out
      (if (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

Once you've created a test_helper.clj you can use test-helper (just like you used clojure.test) (example below) and your tests will automatically be run on exit, and only namespaces with tests will be included in the output.

It's worth noting that some clojure.contrib namespaces seem to include tests, so in practice I end up using a regular expression that ignores all namespaces beginning with "clojure"** when running all tests. With all of those ideas combined, I find I can execute all my tests or only the tests in the current namespace very easily.

Below you can find all the code from this entry.

clojure.test.example.clj

(ns clojure.test.example
  (:use clojure.test test-helper))

(deftest add-1-to-1
  (is (= 2 (+ 1 1))))

(defn add [x y] (+ x y))

(deftest add-x-to-y
  (is (= 5 (add 2 3))))

(deftest add-x-to-y-a-few-times
  (is (= 5 (add 2 3)))
  (is (= 5 (add 1 4)))
  (is (= 5 (add 3 2))))

(deftest add-x-to-y-a-using-are
  (are [x y] (= 5 (add x y))
       2 3
       1 4
       3 2))

(deftest grab-map-values-using-are
  (are [y z] (= y (:x z))
       2 {:x 2}
       1 {:x 1}
       3 {:x 3 :y 4}))

test_helper.clj

(ns test-helper
  (:use clojure.test))

(defmethod report :begin-test-ns [m]
    (with-test-out
      (if (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

* Running a clojure file should be as easy as: java -cp /path/to/clojure.jar clojure.main -i file.to.run.clj

** (run-all-tests #"[^(clojure)].*") ; careful though, now your clojure.test.example tests will be ignored. Don't let that confuse you.

Wednesday, July 07, 2010

High Level Testing with a High Level Language

In the early days of my project we made the decision to high-level test our Java application with Clojure*. One and a half years later, we're still following that path. It seemed worthwhile to document the progress so far.

My current preferred style of testing is rigorous unit testing and less than a dozen high level tests.

This style of testing doesn't catch everything; however, context is king. In my context, we constantly have to balance the number of bugs against the speed at which we deliver. We could test more, but it would slow down delivery. Since we don't want to drastically impact delivery, we try to get the most we can out of the tests that we do write.

A few more notes on context. Our high level tests are written by programmers for programmers. The application is fairly large and complex. We use Java, Clojure, Ruby, C#, & others. We take advantage of open-source frameworks as well as vendor and in-house frameworks. The result is a user-facing application used exclusively in-house. It's not a service or a 'for-sale' product.

The Bad
Context: The team knows Java fairly well and writes the majority of the domain code in Java. The team didn't have any previous experience with Clojure.
Result: Happiness with using a high level language was often impacted by no knowledge of that high level language.

Context: The vast majority of the code is written in Java and is able to be manipulated with IntelliJ.
Result: Some members of the team felt that using an additional language hampered their ability to use automated refactoring tools, and thus the rewards were not worth the cost. Other members believe the powerful language features provide benefits that out-weigh the costs.

Context: The tests need to run on developer boxes and build machines. One and a half years ago, there was no easy way to run Clojure tests in JUnit.
Result: The team hacked together something custom. It didn't take long to write, but the integration with IntelliJ and JUnit is not nearly as nice as using pure Java.

The Interesting
Context: The team has plenty of experience with Object Oriented (OO) based, C-style languages. The team didn't have any previous experience with Functional Programming (FP). Tests are procedural. However, the team was much more experienced writing procedural code in OO than FP.
Result: The paradigm shift impacted the speed at which the team was able to learn Clojure, but the team was able to peek into the world of FP without writing an entire application using an FP language.

The Good
Context: The team needed to build several utilities that allowed them to high level test the application.
Result: It was trivial to create a REPL that allowed us to communicate at a high level with our application. We practically got a command line interface to our application for free.

Context: The team was curious if Clojure would be the right language for solving other problems on the project. The team knew that with less than a dozen high level tests, rewriting the tests in another language would not be a large time investment.
Result: The team was able to determine that Clojure is a viable language choice for several tasks without having to take the leap on a mission critical component whose size may be variable.

Context: High level tests often require a significant amount of setup code. Clojure provides language features (duck-typing, macros, etc) that allow you to reduce the noise often generated by setup code.
Result: The tests are easier to read and maintain (assuming you understand what and how things are being hidden) and the amount of utility code required to run high level tests was significantly reduced.

*We still use Java to unit test our Java, for now.

Wednesday, February 24, 2010

The Maintainability of Unit Tests

At speakerconf 2010 discussion repeatedly arose around the idea that unit tests hinder your ability to refactor and add new features. It's true that tests are invaluable when refactoring the internals of a class as long as the interface doesn't change. However, when the interface does change, updating the associated tests is often the vast majority of the effort. Additionally, if a refactoring changes the interaction between two or more classes, the vast majority of the time is spent fixing tests, for several classes.

In my experience, making the interface or interaction change often takes 15-20% of the time, while changing the associated tests take the other 80-85%. When the effort is split that drastically, people begin to ask questions.

Should I write Unit Tests? The answer at speakerconf was: Probably, but I'm interested in hearing other options.

Ayende proposed that scenario based testing was a better solution. His examples drove home the point that he was able to make large architectural refactorings without changing any tests. Unfortunately, his tests suffered from the same problems that Integration Test advocates have been dealing with for years: Long Running Tests (20 mins to run a suite!) and Poor Defect Localization (where did things go wrong?). However, despite these limitations, he's reporting success with this strategy.

In my opinion, Martin Fowler actually answered this question correctly in the original Refactoring book.

The key is to test the areas that you are most worried about going wrong. That way you get the most benefit for your testing effort.

It's a bit of a shame that sentence lives in Refactoring and not in every book written for developers beginning to test their applications. After years of trying to test everything, I stumbled upon that sentence while creating Refactoring: Ruby Edition. That one sentence changed my entire attitude on Unit Testing.

I still write Unit Tests, but I only focus on testing the parts that provide the most business value.

An example

you find yourself working on an insurance application for a company that stores it's policies by customer SSN. Your application is likely to have several validations for customer information.

The validation that ensures a SSN is 9 numeric digits is obviously very important.

The validation that the customer name is alpha-only is probably closer to the category of "nice to have". If the alpha-only name validation is broken or removed, the application will continue to function almost entirely normally. And, the most likely problem is a typo - probably not the end of the world.

It's usually easy enough to add validations, but you don't need to test every single validation. The value of each validation should be used to determine if a test is warranted.

How do I improve the maintainability of my tests? Make them more concise.

Once you've determined you should write a test, take the time to create a concise test that can be maintained. The longer the test, the more likely it is to be ignored or misunderstood by future readers.

There are several methods for creating more concise tests. My recent work is largely in Java, so my examples are Java related. I've previously written about my preferred method for creating objects in Java Unit Tests. You can also use frameworks that focus on simplicity, such as Mockito. But, the most important aspect of creating concise tests is taking a hard look at object modeling. Removing constructor and method arguments is often the easiest way to reduce the amount of noise within a test.

If you're not using Java, the advice is the same: Remove noise from your tests by improving object modeling and using frameworks that promote descriptive, concise syntax. Removing noise from tests always increases maintainability.

That's it? Yes. I find when I only test the important aspects of an application and I focus on removing noise from the tests that I do write, the maintainability issue is largely addressed. As a result the pendulum swings back towards a more even effort split between features & refactoring vs updating tests.

Jay Fields' Thoughts