Jay Fields' Thoughts: 2010

Monday, December 06, 2010

Clojure: get, get-in, contains?, and some

Clojure provides a get function that returns the value mapped to a key in a set or map. The documentation shows the example: (get map key). While that's completely valid, I tend to use sets and maps as functions when the get is that simple.

For example, I'd use ({"FSU" 31 "UF" 7} "FSU") if I wanted the value of the key "FSU". It's much less likely that I'd use (get {"FSU" 31 "UF" 7} "FSU"), largely because the former example is less typing.

However, if I'm doing something more complicated I've found the get function to be helpful. Often, I like to use the combination of get and -> or ->>.

The following example takes some json-data, converts it to a clojure map, and pulls the value from the "FSU" key.

(-> json-data read-json (get "FSU"))

It's also worth noting, in our example we have to use get, since strings are not Clojure functions. If instead we chose to make our keys keywords, we could choose either of the following solutions. I don't believe there is a right or wrong solution; which you use will likely be a personal preference.

(-> json-data (read-json true) (get :FSU))
(-> json-data (read-json true) :FSU)

We can modify the example and assume nested json that results in the following clojure map: {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}}

Building off of a previous example, we could use a slightly modified version to get the score for FSU.

(-> json-data read-json (get "scores") (get "FSU"))

However, getting nested values is common enough that Clojure provides a function designed specifically to address that need: get-in

The get-in function returns the value in a nested associative structure when given a sequence of keys. Using get-in you can replace the last example with the following code.

(-> json-data read-json (get-in ["scores" "FSU"]))

The get-in function is very helpful when dealing with nested structures; however, there is one gotcha that I've run into. The following shows a REPL session and what get-in returns with various keys.

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores" "FSU"])
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores"])      
{"FSU" 31, "UF" 7}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} [])        
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} nil)
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

Everything looks logical enough; however, if you are pulling your key sequence from somewhere else you could end up with unexpected results. The following example shows how a simple mistake could result in a bug.

user=> (def score-key-seqs {"FSU" ["scores" "FSU"]})                             
#'user/score-key-seqs

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "FSU"))
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "UF")) 
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

If you're always expecting a number and you get a map instead, things might not work out well.

It's also worth noting that both get and get-in allow you to specify default values. You can check the documentation on clojure.org for more information on default values.

You don't always need to get a value, sometimes it's good enough to know that a key is in a map or set. In general I use the value returned from a map or set to determine if a key exists - the following snippet uses that pattern.

(if (a-map :key) 
  (do-true-behaviors) 
  (do-false-behaviors))

However, that pattern fails if the value of :key is nil. If it's possible that the value might be nil you might want to use Clojure's contains? function. The contains? function returns true if key is present in the given collection, otherwise returns false. The following code pasted from a REPL session demonstrates that contains? works perfectly well with nil.

user=> (contains? {:foo nil} :foo)
true

The contains? function works well with sets and maps; however, if you try to use it on a vector you might get surprising results.

user=> (contains? [1 3 4] 2)
true

For numerically indexed collections like vectors and Java arrays, the contains? function tests if the numeric key is within the range of indexes. The Clojure documentation recommends looking at the some function if you're looking for an item in a list.

The some function returns the first logical true value of a predicate for any item in the list, else nil. The following REPL session shows how you can use a set as the predicate with some to determine if a value is found in a list.

user=> (some #{2} [1 3 4])  
nil
user=> (some #{1} [1 3 4])
1

Clojure provides various functions for operating on maps and sets. At first glance some of them may look superfluous; however, as you spend more time working with sets and maps you'll start to appreciate the subtle differences and the value they provide.

Tuesday, November 30, 2010

Taking a Second Look at Collective Code Ownership

It's common to hear proponents of Agile discussing the benefits of collective code ownership. The benefits can be undeniable. Sharing knowledge ensures at least one other perspective and drastically reduces Bus Risk. However, sharing comes at a cost: time. The ROI of sharing with a few people can be greatly different than the ROI of sharing with 5 or more people.

I do believe in the benefits of collective code ownership. Collective code ownership is a step towards taking an underachieving team and turning them into a good team. However, I'm becoming more and more convinced that it's not the way to take good team and make them great.

(context: I believe these ideas apply to teams larger than 3. Teams of 2-3 should likely be working with everyone on everything.)

If you pair program, you will incur a context switch every time you rotate pairs. Context switches always have a non-zero cost, and the more you work on, the larger the context switch is likely to be. Teams where everyone works on everything are very likely paying for expensive context switches on a regular basis. In fact, the words Context Switch are often used specifically to point out the cost.

Working on everything also ensures a non-trivial amount of time ramping up on whatever code you are about to start working on. Sometimes you work on something you know fairly well. Other times you're working on something you've never seen before. Working on something you've never seen before creates two choices (assuming pair-programming): go along for the ride, understanding little - or - slow your pair down significantly while they explain what's going on.

Let's say you choose to slow your pair down for the full explanation: was it worth it? If you're the only other person that knows the component, it's very likely that it was worth it. What if everyone else on the team already knows that component deeply? Well, if they all die, you can maintain the app, but I don't think that's going to be the largest issue on the team.

(the same ideas apply if you don't pair-program, except you don't have the "go along for the ride" option)

I can hear some of you right now: We rotate enough that the context switch is virtually free and because we rotate so much there's little ramp up time. You might be right. Your problem domain might be so simple that jumping on and off of parts of your system is virtually free. However, if you're domain is complex in anyway, I think you're underestimating the cost of context switches and ramp up time. Also, the devil is traditionally in the details, so you're "simple domain" probably isn't as simple as you think.

Let's assume your domain is that simple: it might be cheaper to rewrite the software than take your Bus Number from 3 to 4.

Another benefit of pair programming combined with collective code ownership is bringing up everyone's skill level to that of the most skilled team member. In my opinion, that's something you need to worry about on an underachieving team, not a good team. If you're on a good team, it's likely that you can learn just as much from any member of your team; therefore, you are not losing anything by sticking to working with just a few of them in specific areas. You really only run into an education problem if you're team has more learners than mentors - and, in that case, you're not ready to worry about going from good to great.

There's also opportunity cost of not sticking to certain areas. Focusing on a problem allows you to create better solutions. Specifically, it allows you to create a vision of what needs to be done, work towards that vision and constantly revise where necessary.

Mark Twain once wrote that his letters would be shorter if he had more time. The same is often true of software. The simplest solution is not always the most obvious. If you're jumping from problem to problem, you're more likely to create an inferior solution. You'll solve problems, but you'll be creating higher maintenance costs for the project in the long term.

Instead, I often find it very helpful to ponder a problem and create a more simple and, very often, a more concise solution. In my experience, the maintenance costs are also greatly reduced by the simplified, condensed solution.

I'd like to repeat for clarity: Collective code ownership has benefits. There's no doubt that it is better to have everyone work on everything than have everyone focused on individual parts of the codebase. However, it's worth considering the cost of complete sharing if you are trying to move from good to great.

Saturday, October 30, 2010

Experience Report: Feature Toggle over Feature Branch

We often use Feature Toggle on my current team (when gradual release isn't possible). My experience so far has been: gradual release is better than Feature Toggle, and Feature Toggle is better than Feature Branch.

I found Martin's bliki entry on Feature Toggle to be a great description, but the entry doesn't touch on the primary reasons why I prefer Feature Toggle to Feature Branch.

When using Feature Branch I have to constantly rebase to avoid massive merge issues, in general. That means rebasing during development and testing. Rebasing a branch while it's complete and being tested, has often lead to subtle, merge related bugs that go unnoticed because the feature is not under active development.

Additionally, (and likely more problematic) once I merge a feature branch I am committed (no pun intended). If a bug in the new feature, that requires rolling production back to the previous release, is found after the branch has been merged I find myself rolling back the feature commit or continuing with a un-releasable trunk. Rolling back the commit is painful because it is likely a large commit. If I continue on with an un-releasable trunk and another (unrelated to the new feature) bug is found in trunk I'm in trouble: I can't fix the new bug and release.

That's bad. I either lose significant time rolling back the release (terrible), or I roll the dice (terrifying). I was burned by this situation once already this year. It is not something I'm looking to suffer again in the near future.

Feature Toggle avoids this entirely by allowing me to run with the toggle available and turn it back on if things go wrong. When I feel comfortable that everything is okay (usually, a week in prod is good enough), I clean up the unnecessary toggles.

Of course, nothing is black & white. Sometimes a Feature Toggle increases the scope of the work to an unacceptable level, and Feature Branch is the correct decision. However, I always weigh the terrible/terrifying situation when I'm choosing which direction is optimal.

It's also worth noting, I roll the current day's changes into production every night. It's possible that your experience will be vastly different if your release schedules are multi-day or multi-week.

Thursday, September 30, 2010

Clojure: Flatten Keys

I recently needed to take a nested map and flatten the keys. After a bit of trial and error I came up with the following code. (note: the example expects Expectations)

(ns example
  (:use expectations))

(defn flatten-keys* [a ks m]
  (if (map? m)
    (reduce into (map (fn [[k v]] (flatten-keys* a (conj ks k) v)) (seq m)))
    (assoc a ks m)))

(defn flatten-keys [m] (flatten-keys* {} [] m))

(expect
  {[:z] 1, [:a] 9, [:b :c] Double/NaN, [:b :d] 1, [:b :e] 2, [:b :f :g] 10, [:b :f :i] 22} 
  (flatten-keys {:z 1 :a 9 :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}))

As the test shows, the code converts

{:z 1 :a 9 :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}

into

{[:z] 1, [:a] 9, [:b :c] Double/NaN, [:b :d] 1, [:b :e] 2, [:b :f :g] 10, [:b :f :i] 22}

Improvement suggestions welcome.

Clojure: Another Testing Framework - Expectations

Once upon a time I wrote Expectations for Ruby. I wanted a simple testing framework that allowed me to specify my test with the least amount of code.

Now that I'm spending the majority of my time in Clojure, I decided to create a version of Expectations for Clojure.

At first it started as a learning project, but I kept adding productivity enhancements. Pretty soon, it became annoying when I wasn't using Expectations. Obviously, if you write your own framework you are going to prefer to use it. However, I think the productivity enhancements might be enough for other people to use it as well.

So why would you want to use it?

Tests run automatically. Clojure hates side effects, yeah, I hear you. But, I hate wasting time and repeating code. As a result, Expectations runs all the tests on JVM shutdown. This allows you to execute a single file to run all the tests in that file, without having to specify anything additional. There's also a hook you can call if you don't want the tests to automatically run. (If you are looking for an example, there's a JUnit runner that disables running tests on shutdown)

What to test is inferred from your "expected" value. An equality test is probably the most common test written. In Expectations, an equality test looks like the following example.

(expect 3 (+ 1 2))

That's simple enough, but what if you want to match a regex against a string? The following example does exactly that, and it uses the same syntax.

(expect #"foo" "afoobar")

Other common tests are verifying an exception is thrown or checking the type of an actual value. The following snippets test those two conditions.

(expect ArithmeticException (/ 12 0))

(expect String "foo")

Testing subsets of the actual value. Sometimes you want an exact match, but there are often times when you only care about a subset of the actual value. For example, you may want to test all the elements of a map except the time and id pairs (presumably because they are dynamic). The following tests show how you can verify that some key/value pairs are in a map, a element is in a set, or an element is in a list.

;; k/v pair in map. matches subset
(expect {:foo 1} (in {:foo 1 :cat 4}))

;; key in set
(expect :foo (in (conj #{:foo :bar} :cat)))

;; val in list
(expect :foo (in (conj [:bar] :foo)))

Double/NaN is annoying. (not= Double/NaN Double/NaN) ;=> true. I get it, conceptually. In practice, I don't want my tests failing because I can't compare two maps that happen to have Double/NaN as the value for a matching key. In fact, 100% of the time I want (= Double/NaN Double/NaN) ;=> true. And, yes, I can rewrite the test and use Double/isNaN. I can. But, I don't want to. Expectations allows me to pretend (= Double/NaN Double/NaN) ;=> true. It might hurt me in the future. I'll let you know. For now, I prefer to write concise tests that behave as "expected".

Try rewriting this and using Double/isNaN (it's not fun)

(expect
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}}
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

Concise Java Object testing. Inevitably, I seem to end up with a few Java objects. I could write a bunch of different expect statements, but I opted for a syntax that allows me to check everything at once.

(given (java.util.ArrayList.)
       (expect
        .size 0
        .isEmpty true))

Trimmed Stacktraces. I'm sure it's helpful to look through Clojure and Java's classes at times. However, I find the vast majority of the time the problem is in my code. Expectations trims many of the common classes that are from Clojure and Java, leaving much more signal than noise. Below is the stacktrace reported when running the failure examples from the Expectations codebase.

failure in (failure_examples.clj:8) : failure.failure-examples
      raw: (expect 1 (one))
  act-msg: exception in actual: (one)
    threw: class java.lang.ArithmeticException-Divide by zero
           failure.failure_examples$two__375 (failure_examples.clj:4)
           failure.failure_examples$one__378 (failure_examples.clj:5)
           failure.failure_examples$G__381__382$fn__387 (failure_examples.clj:8)
           failure.failure_examples$G__381__382 (failure_examples.clj:8)

Every stacktrace line is from my code, where the problem lives.

Descriptive Error Messages. Expectations does it's best to give you all the important information when a failure does occur. The following failure shows what keys are missing from actual and expected as well is which values do not match.

running

   (expect
     {:z 1 :a 9          :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}
     {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

generates

   failure in (failure_examples.clj:110) : failure.failure-examples
         raw: (expect {:z 1, :a 9, :b {:c Double/NaN, :d 1, :e 2, :f {:g 10, :i 22}}} {:x 1, :a Double/NaN, :b {:c Double/NaN, :d 2, :e 4, :f {:g 11, :h 12}}})
      result: {:z 1, :a 9, :b {:c NaN, :d 1, :e 2, :f {:g 10, :i 22}}} are not in {:x 1, :a NaN, :b {:c NaN, :d 2, :e 4, :f {:g 11, :h 12}}}
     exp-msg: :x is in actual, but not in expected
              :b {:f {:h is in actual, but not in expected
     act-msg: :z is in expected, but not in actual
              :b {:f {:i is in expected, but not in actual
     message: :b {:e expected 2 but was 4
              :b {:d expected 1 but was 2
              :b {:f {:g expected 10 but was 11
              :a expected 9 but was NaN

note: I know it's a bit hard to read, but I wanted to cover all the possible errors with one example. In practice you'll get a few messages that will tell you exactly what is wrong.

For example, the error tells you

:b {:f {:g expected 10 but was 11

With that data it's pretty easy to see the problem in

(expect
  {:z 1 :a 9          :b {:c Double/NaN :d 1 :e 2 :f {:g 10 :i 22}}}
  {:x 1 :a Double/NaN :b {:c Double/NaN :d 2 :e 4 :f {:g 11 :h 12}}})

Expectations also tells you, when comparing two lists:

if the lists are the same, but differ only in order

if the lists are the same, but one list has duplicates

if the lists are not the same, which list is larger

JUnit integration. My project uses both Java and Clojure. I like running my tests in IntelliJ and I like TeamCity running my tests as part of the build. To accomplish this using Expectations all you need to do is create a java class similar to the example below.

import expectations.junit.ExpectationsTestRunner;
import org.junit.runner.RunWith;

@RunWith(expectations.junit.ExpectationsTestRunner.class)
public class FailureTest implements ExpectationsTestRunner.TestSource{

    public String testPath() {
        return "/path/to/the/root/folder/holding/your/tests";
    }
}

The Expectations Test Runner runs your Clojure tests in the same way that the Java tests run, including the green/red status icons and clickable links when things fail.

Why wouldn't you use Expectations?

Support. I'm using it to test my production code, but if I find errors I have to go fix them. You'll be in the same situation. I'll be happy to fix any bugs you find, but I might not have the time to get to it as soon as you send me email.

If you're willing to live on the bleeding edge, feel free to give it a shot.

Wednesday, September 01, 2010

Clojure: Mocking

An introduction to clojure.test is easy, but it doesn't take long before you feel like you need a mocking framework. As far as I know, you have 3 options.

Take a look at Midje. I haven't gone down this path, but it looks like the most mature option if you're looking for a sophisticated solution.

Go simple. Let's take an example where you want to call a function that computes a value and sends a response to a gateway. Your first implementation looks like the code below. (destructuring explained)
```
(defn withdraw [& {:keys [balance withdrawal account-number]}]
 (gateway/process {:balance (- balance withdrawal)
                   :withdrawal withdrawal
                   :account-number account-number}))
```
No, it's not pure. That's not the point. Let's pretend that this impure function is the right design and focus on how we would test it.

You can change the code a bit and pass in the gateway/process function as an argument. Once you've changed how the code works you can test it by passing identity as the function argument in your tests. The full example is below.
```
(ns gateway)

(defn process [m] (println m))

(ns controller
 (:use clojure.test))

(defn withdraw [f & {:keys [balance withdrawal account-number]}]
 (f {:balance (- balance withdrawal)
     :withdrawal withdrawal
     :account-number account-number}))

(withdraw gateway/process :balance 100 :withdrawal 22 :account-number 4)
;; => {:balance 78, :withdrawal 22, :account-number 4}

(deftest withdraw-test
 (is (= {:balance 78, :withdrawal 22, :account-number 4}
  (withdraw identity :balance 100 :withdrawal 22 :account-number 4))))

(run-all-tests #"controller")
```
If you run the previous example you will see the println output and the clojure.test output, verifying that our code is working as we expected. This simple solution of passing in your side effect function and using identity in your tests can often obviate any need for a mock.

Solution 2 works well, but has the limitations that only one side-effecty function can be passed in and it's result must be used as the return value.

Let's extend our example and say that we want to log a message if the withdrawal would cause insufficient funds. (Our gateway/process and log/write functions will simply println since this is only an example, but in production code their behavior would differ and both would be required)

(ns gateway)

(defn process [m] (println "gateway: " m))

(ns log)

(defn write [m] (println "log: " m))

(ns controller
  (:use clojure.test))

(defn withdraw [& {:keys [balance withdrawal account-number]}]
  (let [new-balance (- balance withdrawal)]
    (if (> 0 new-balance)
      (log/write "insufficient funds")
      (gateway/process {:balance new-balance
                        :withdrawal withdrawal
                        :account-number account-number}))))

(withdraw :balance 100 :withdrawal 22 :account-number 4)
;; => gateway:  {:balance 78, :withdrawal 22, :account-number 4}

(withdraw :balance 100 :withdrawal 220 :account-number 4)
;; => log:  insufficient funds

Our new withdraw implementation calls two functions that have side effects. We could pass in both functions, but that solution doesn't seem to scale very well as the number of passed functions grows. Also, passing in multiple functions tends to clutter the signature and make it hard to remember what is the valid order for the arguments. Finally, if we need withdraw to always return a map showing the balance and withdrawal amount, there would be no easy solution for verifying the string sent to log/write.

Given our implementation of withdraw, writing a test that verifies that gateway/process and log/write are called correctly looks like a job for a mock. However, thanks to Clojure's binding function, it's very easy to redefine both of those functions to capture values that can later be tested.

The following code rebinds both gateway/process and log/write to partial functions that capture whatever is passed to them in an atom that can easily be verified directly in the test.

(ns gateway)

(defn process [m] (println "gateway: " m))

(ns log)

(defn write [m] (println "log: " m))

(ns controller
  (:use clojure.test))

(defn withdraw [& {:keys [balance withdrawal account-number]}]
  (let [new-balance (- balance withdrawal)]
    (if (> 0 new-balance)
      (log/write "insufficient funds")
      (gateway/process {:balance new-balance
                        :withdrawal withdrawal
                        :account-number account-number}))))

(deftest withdraw-test1
  (let [result (atom nil)]
    (binding [gateway/process (partial reset! result)]
      (withdraw :balance 100 :withdrawal 22 :account-number 4)
      (is (= {:balance 78, :withdrawal 22, :account-number 4} @result)))))

(deftest withdraw-test2
  (let [result (atom nil)]
    (binding [log/write (partial reset! result)]
      (withdraw :balance 100 :withdrawal 220 :account-number 4)
      (is (= "insufficient funds" @result)))))

(run-all-tests #"controller")

In general I use option 2 when I can get away with it, and option 3 where necessary. Option 3 adds enough additional code that I'd probably look into Midje quickly if I found myself writing a more than a few tests that way. However, I generally go out of my way to design pure functions, and I don't find myself needing either of these techniques very often.

Monday, August 30, 2010

Clojure: Using Sets and Maps as Functions

Clojure sets and maps are functions.

Since they are functions, you don't need functions to get values out of them. You can use the map or set as the example below shows.

(#{1 2} 1)
> 1

({:a 2 :b 3} :a)
> 2

That's nice, but it's not exactly game changing. However, when you use sets or maps with high order functions you can get a lot of power with a little code.

For example, the following code removes all of the elements of a vector if the element is also in the set.

(def banned #{"Steve" "Michael"})
(def guest-list ["Brian" "Josh" "Steve"])

(remove banned guest-list)
> ("Brian" "Josh")

I'm a big fan of using sets in the way described above, but I don't often find myself using maps in the same way. The following code works, but I rarely use maps as predicates.

(def banned {"Steve" [] "Michael" []})
(def guest-list ["Brian" "Josh" "Steve"])

(remove banned guest-list)
> ("Brian" "Josh")

However, yesterday I needed to compare two maps and get the list of ids in the second map where the quantities didn't match the quantities in the first map. I started by using filter and defining a function that checks if the quantities are not equal. The following code shows solving the problem with that approach.

; key/value pairs representing order-id and order-quantity
(def map1 {1 44 2 33})
(def map2 {1 55 2 33})

(defn not=quantities [[id qty]] (not= (map1 id) qty))
(keys (filter not=quantities map2))
> (1)

However, since you can use maps as filter functions you can also solve the problem by merging the maps with not= and filtering by the result. The following code shows an example of merging and using the result as the predicate.

; key/value pairs representing order-id and order-quantity
(def map1 {1 44 2 33})
(def map2 {1 55 2 33})

(filter (merge-with not= map1 map2) (keys map2))
> (1)

I don't often find myself using maps as predicates, but in certain cases it's exactly what I need.

Wednesday, August 11, 2010

clojure.test Introduction

I'll admit it, the first thing I like to do when learning a new language is fire up a REPL. However, I'm usually ready for the next step after typing in a few numbers, strings and defining a function or two.

What feels like centuries ago, Mike Clark wrote an article about using unit testing to learn a new language. Mike was ahead of his time. This blog entry should help you if you want to follow Mike's advice.

Luckily, Clojure has built in support for simple testing. (I'm currently using Clojure 1.2, you can download it from clojure.org)

Before we get started, let's make sure everything is working. Save a file with the following clojure in it and run* it with clojure.

(ns clojure.test.example
  (:use clojure.test))

(run-all-tests)

If everything is okay, you should see something similar the following output.

Testing clojure.walk

Testing clojure.core

(a bunch of other namespaces tested)

Testing clojure.zip

Ran 0 tests containing 0 assertions.
0 failures, 0 errors.

If you've gotten this far, you are all set to start writing your own tests. If you are having any trouble, I suggest logging into the #clojure IRC chat room on Freenode.net

The syntax for defining tests is very simple. The following test verifies that 1 + 1 = 2. You'll want to add the test after the ns definition and before the (run-all-tests) in the file you just created.

(deftest add-1-to-1
  (is (= 2 (+ 1 1))))

Running the test should produce something similar to the following output.

Testing clojure.walk

Testing clojure.test.example

(a bunch of other namespaces tested)

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

We see all of the standard clojure namespaces; however, we see our namespace (clojure.test.example) in the results as well. The output at the bottom also tells us that 1 test with 1 assertion was executed.

The following example shows testing a custom add function. (we will add additional tests from here, without ever deleting the old tests)

(defn add [x y] (+ x y))

(deftest add-x-to-y
  (is (= 5 (add 2 3))))

If everything goes to plan, running your tests should now produce the following text towards the bottom of the output.

Ran 2 tests containing 2 assertions.
0 failures, 0 errors.

At this point you might want to pass in a few different numbers to verify that add works as expected.

(deftest add-x-to-y-a-few-times
  (is (= 5 (add 2 3)))
  (is (= 5 (add 1 4)))
  (is (= 5 (add 3 2))))

Running the tests shows us our status

Ran 3 tests containing 5 assertions.
0 failures, 0 errors.

This works perfectly fine; however, clojure.test also provides are for verifying several values.

The following example tests the same conditions using are.

(deftest add-x-to-y-a-using-are
  (are [x y] (= 5 (add x y))
       2 3
       1 4
       3 2))

And, the unsurprising results.

Ran 4 tests containing 8 assertions.

That's a simple are; however, you can do whatever you need in the form. Let's grab the value out of a map as an additional example.

(deftest grab-map-values-using-are
  (are [y z] (= y (:x z))
       2 {:x 2}
       1 {:x 1}
       3 {:x 3 :y 4}))

Leaving us with

Ran 5 tests containing 11 assertions.

The is and are macros will be all that you need for 90% of all the tests you'll ever want to write. For additional assertions and more details you can check out the clojure.test documentation.

Advanced Topics (very unnecessary to get started)

I get annoyed with noise in my test results. Our results have been very noisy due to the namespace reporting. The run-all-tests function takes a regular expression (documented here). We can change our test running call to include a regular expression, as the following example shows.

(run-all-tests #"clojure.test.example")

Once we switch to providing a regular expression the results should be limited to the following output.


Testing clojure.test.example

Ran 5 tests containing 11 assertions.
0 failures, 0 errors.

This approach works fine for our current sample file; however, it seems like a better solution would be to stop reporting namespaces that do not contain any tests. The following snippet changes the report multimethod to ignore namespaces that don't contain any tests.

(defmethod report :begin-test-ns [m]
    (with-test-out
      (when (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

If you're just getting started, don't worry you don't need to understand what's going on in that snippet. I've copied the original report method and made it conditional by adding the code in bold. As a result, the namespace is only printed if it contains any tests.

Now that our results are clean, let's talk about ways of getting those results.

Adding calls to the run-all-tests function isn't a big deal when working with one namespace; however, you'll need to get clever when you want to run a suite of tests. I've been told that leiningen and Maven have tasks that allow you to run all the tests. You might want to start there. I don't currently use either one, and I'm lazy. I don't want to set up either, especially since all I want to do is run all my tests.

It turns out it's very easy to add a shutdown hook in Java. So, as a simple solution, I run all my tests from the Java shutdown hook.

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

In general, I create a test_helper.clj with the following code.

(ns test-helper
  (:use clojure.test))

(defmethod report :begin-test-ns [m]
    (with-test-out
      (if (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

Once you've created a test_helper.clj you can use test-helper (just like you used clojure.test) (example below) and your tests will automatically be run on exit, and only namespaces with tests will be included in the output.

It's worth noting that some clojure.contrib namespaces seem to include tests, so in practice I end up using a regular expression that ignores all namespaces beginning with "clojure"** when running all tests. With all of those ideas combined, I find I can execute all my tests or only the tests in the current namespace very easily.

Below you can find all the code from this entry.

clojure.test.example.clj

(ns clojure.test.example
  (:use clojure.test test-helper))

(deftest add-1-to-1
  (is (= 2 (+ 1 1))))

(defn add [x y] (+ x y))

(deftest add-x-to-y
  (is (= 5 (add 2 3))))

(deftest add-x-to-y-a-few-times
  (is (= 5 (add 2 3)))
  (is (= 5 (add 1 4)))
  (is (= 5 (add 3 2))))

(deftest add-x-to-y-a-using-are
  (are [x y] (= 5 (add x y))
       2 3
       1 4
       3 2))

(deftest grab-map-values-using-are
  (are [y z] (= y (:x z))
       2 {:x 2}
       1 {:x 1}
       3 {:x 3 :y 4}))

test_helper.clj

(ns test-helper
  (:use clojure.test))

(defmethod report :begin-test-ns [m]
    (with-test-out
      (if (some #(:test (meta %)) (vals (ns-interns (:ns m))))
        (println "\nTesting" (ns-name (:ns m))))))

(.addShutdownHook
 (Runtime/getRuntime)
 (proxy [Thread] []
   (run []
 (run-all-tests))))

* Running a clojure file should be as easy as: java -cp /path/to/clojure.jar clojure.main -i file.to.run.clj

** (run-all-tests #"[^(clojure)].*") ; careful though, now your clojure.test.example tests will be ignored. Don't let that confuse you.

Tuesday, July 27, 2010

Clojure: Destructuring

In The Joy of Clojure (TJoC) destructuring is described as a mini-language within Clojure. It's not essential to learn this mini-language; however, as the authors of TJoC point out, destructuring facilitates concise, elegant code.

What is destructuring?

Clojure supports abstract structural binding, often called destructuring, in let binding lists, fn parameter lists, and any macro that expands into a let or fn. -- http://clojure.org/special_forms

The simplest example of destructuring is assigning the values of a vector.

user=> (def point [5 7])
#'user/point

user=> (let [[x y] point]
         (println "x:" x "y:" y))
x: 5 y: 7

note: I'm using let for my examples of destructuring; however, in practice I tend to use destructuring in function parameter lists at least as often, if not more often.

I'll admit that I can't remember ever using destructuring like the first example, but it's a good starting point. A more realistic example is splitting a vector into a head and a tail. When defining a function with an arglist** you use an ampersand. The same is true in destructuring.

user=> (def indexes [1 2 3])
#'user/indexes

user=> (let [[x & more] indexes]
         (println "x:" x "more:" more))
x: 1 more: (2 3)

It's also worth noting that you can bind the entire vector to a local using the :as directive.

user=> (def indexes [1 2 3])
#'user/indexes

user=> (let [[x & more :as full-list] indexes]
         (println "x:" x "more:" more "full list:" full-list))
x: 1 more: (2 3) full list: [1 2 3]

Vector examples are the easiest; however, in practice I find myself using destructuring with maps far more often.

Simple destructuring on a map is as easy as choosing a local name and providing the key.

user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{the-x :x the-y :y} point]
         (println "x:" the-x "y:" the-y))
x: 5 y: 7

As the example shows, the values of :x and :y are bound to locals with the names the-x and the-y. In practice we would never prepend "the-" to our local names; however, using different names provides a bit of clarity for our first example. In production code you would be much more likely to want locals with the same name as the key. This works perfectly well, as the next example shows.

user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{x :x y :y} point]
         (println "x:" x "y:" y))
x: 5 y: 7

While this works perfectly well, creating locals with the same name as the keys becomes tedious and annoying (especially when your keys are longer than one letter). Clojure anticipates this frustration and provides :keys directive that allows you to specify keys that you would like as locals with the same name.

user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{:keys [x y]} point]
         (println "x:" x "y:" y))
x: 5 y: 7

There are a few directives that work while destructuring maps. The above example shows the use of :keys. In practice I end up using :keys the most; however, I've also used the :as directive while working with maps.

The following example illustrates the use of an :as directive to bind a local with the entire map.

user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{:keys [x y] :as the-point} point]
         (println "x:" x "y:" y "point:" the-point))
x: 5 y: 7 point: {:x 5, :y 7}

We've now seen the :as directive used for both vectors and maps. In both cases the local is always assigned to the entire expression that is being destructured.

For completeness I'll document the :or directive; however, I must admit that I've never used it in practice. The :or directive is used to assign default values when the map being destructured doesn't contain a specified key.

user=> (def point {:y 7})
#'user/point
 
user=> (let [{:keys [x y] :or {x 0 y 0}} point]
         (println "x:" x "y:" y))
x: 0 y: 7

Lastly, it's also worth noting that you can destructure nested maps, vectors and a combination of both.

The following example destructures a nested map

user=> (def book {:name "SICP" :details {:pages 657 :isbn-10 "0262011530"}})
#'user/book

user=> (let [{name :name {pages :pages isbn-10 :isbn-10} :details} book]
         (println "name:" name "pages:" pages "isbn-10:" isbn-10))
name: SICP pages: 657 isbn-10: 0262011530

As you would expect, you can also use directives while destructuring nested maps.

user=> (def book {:name "SICP" :details {:pages 657 :isbn-10 "0262011530"}})
#'user/book
user=> 
user=> (let [{name :name {:keys [pages isbn-10]} :details} book]
         (println "name:" name "pages:" pages "isbn-10:" isbn-10))
name: SICP pages: 657 isbn-10: 0262011530

Destructuring nested vectors is also very straight-forward, as the following example illustrates

user=> (def numbers [[1 2][3 4]])
#'user/numbers

user=> (let [[[a b][c d]] numbers]
  (println "a:" a "b:" b "c:" c "d:" d))
a: 1 b: 2 c: 3 d: 4

Since binding forms can be nested within one another arbitrarily, you can pull apart just about anything -- http://clojure.org/special_forms

The following example destructures a map and a vector at the same time.

user=> (def golfer {:name "Jim" :scores [3 5 4 5]})
#'user/golfer

user=> (let [{name :name [hole1 hole2] :scores} golfer] 
         (println "name:" name "hole1:" hole1 "hole2:" hole2))
name: Jim hole1: 3 hole2: 5

The same example can be rewritten using a function definition to show the simplicity of using destructuring in parameter lists.

user=> (defn print-status [{name :name [hole1 hole2] :scores}] 
  (println "name:" name "hole1:" hole1 "hole2:" hole2))
#'user/print-status

user=> (print-status {:name "Jim" :scores [3 5 4 5]})
name: Jim hole1: 3 hole2: 5

There are other (less used) directives and deeper explanations available on http://clojure.org/special_forms and in The Joy of Clojure. I recommend both.

**(defn do-something [x y & more] ... )

Tuesday, July 20, 2010

Clojure: Composing Functions

Before Clojure, I had never used a Functional Programming language. My experience has primarily been in C#, Ruby, & Java. So, learning how to use Clojure effectively has been a fun and eye-opening experience.

I've noticed a few things:

The Clojure language has a lot of built in support for maps and hashes (literal syntax, destructuring, etc)
Rich Hickey is quoted as saying "It is better to have 100 functions operate on one data abstraction than 10 functions on 10 data structures." Clojure reflects this opinion and has loads of functions that work with Sequences.
I strongly dislike working with people who think Map<String, Map<String, String>> is generally a good idea in Java. However, all the support for maps and vectors in Clojure makes working with them easy and efficient.
Once you embrace functions, maps, vectors, etc, composing functions together to transform data becomes effortless.

Like I said, I'm still learning Clojure. Recently I refactored some code in a way that I thought was worth documenting for other people who are also learning Clojure.

Let's start with a simple example. Let's assume we want to track the score in the British Open (Golf). The data structure we will be working with will be a map of maps, keyed by country and then player.

{:England {"Paul Casey" -1}}

The above example shows that Paul Casey plays for England and currently has the score of -1. We need a function that allows you to update a players score. Let's also assume that scores for individual holes come in the following map form.

{:player "Paul Casey" :country :England :score -1}

Given the above code, you would expect the following test to pass.

(def current-score {:England {"Paul Casey" -1}})

(deftest update-score-test
  (is (= 
    {:England {"Paul Casey" -2}} 
    (update-score current-score {:player "Paul Casey" :country :England :score -1}))))

(Let's ignore the fact that I'm def'ing current-score. It's only for simplicity's sake.)

There are a lot of ways to make that test pass. It took me a few tries to come up with something I liked. Below is the path I took.

Step one, get the test passing using the functions I know.

(defn update-score [current update]
  (let [player (:player update)
        country (:country update)
        hole-score (:score update)]
    (merge current {country {player (+ ((current country) player) hole-score)}})))

The above example code is a victory when you are learning Clojure. The tests pass, you've used a few features of the language. It's a great first pass. However, I'm sure the Clojure veterans are drooling over the refactoring possibilities.

Step two, learn the update-in function. Like I previously mentioned, Clojure has a lot of functions for working with sequences (maps, vectors, lists, etc). It's not always obvious which function you should be using. However, I've had loads of success learning from the mailing list, and getting immediate answers in the #clojure chat room on freenode. One way or another, I eventually found the update-in function, which is designed up update a nested key.

(defn update-score [current update]
  (let [player (:player update)
        country (:country update)
        hole-score (:score update)]
    (update-in current [country player] (fn [_] (+ ((current country) player) hole-score)))))

The switch to update-in is relatively easy and cleans up a bit of code, but now I've introduced an anonymous function. Anonymous functions are great, but if I can get away with not creating them, I always take that route.

However, before we eliminate the anonymous function, we're actually better off making sure we understand the new function (update-in). While the current version of the code works, it's much cleaner if you use the current value that is passed into the anonymous function instead of looking up the value yourself.

(defn update-score [current update]
  (update-in current [(:country update) (:player update)] (fn [overall-score] (+ overall-score (:score update)))))

Since we no longer use country and player in multiple places, it seems like removing the let statement is a logical choice. As a result, the amount of code necessary to solve this problem seems to get smaller and smaller.

However, we can actually take this a step farther and use destructuring to provide an even cleaner implementation.

(defn update-score [current {:keys [country player score]}]
  (update-in current [country player] (fn [overall-score] (+ overall-score score))))

If you aren't familiar with destructuring the syntax might be a bit confusing. I created a blog entry for destructuring that should help if you aren't familiar with it. There's also a good write up with destructuring examples available on clojure.org. I would also recommend writing a few simple functions and try out the examples for yourself.

There's one more step to take, but for me it was a big one. The documentation for update-in states:

Usage: (update-in m [k & ks] f & args)

'Updates' a value in a nested associative structure, where ks is a sequence of keys and f is a function that will take the old value and any supplied args and return the new value, and returns a new nested structure.

What I currently have works fine, but the additional args that update-in takes allows me to provide a more concise solution that I greatly prefer.

(defn update-score [current {:keys [country player score]}]
  (update-in current [country player] + score))

This final version doesn't require any anonymous functions. Instead it relies on on update-in to apply the current value and any additional args to the function that you passed in. The final version is so concise you begin to wonder if you really need to define an update-score function.

The final snippet is an example of why I'm very intrigued by Clojure these days. The ability to compose functions and mix them in with the powerful functions of the language results in very concise code.

I consider what I would need to write to accomplish the same thing in Java or Ruby, and I have to wonder if the Functional Programming supporters might just be on to something.

Wednesday, July 07, 2010

High Level Testing with a High Level Language

In the early days of my project we made the decision to high-level test our Java application with Clojure*. One and a half years later, we're still following that path. It seemed worthwhile to document the progress so far.

My current preferred style of testing is rigorous unit testing and less than a dozen high level tests.

This style of testing doesn't catch everything; however, context is king. In my context, we constantly have to balance the number of bugs against the speed at which we deliver. We could test more, but it would slow down delivery. Since we don't want to drastically impact delivery, we try to get the most we can out of the tests that we do write.

A few more notes on context. Our high level tests are written by programmers for programmers. The application is fairly large and complex. We use Java, Clojure, Ruby, C#, & others. We take advantage of open-source frameworks as well as vendor and in-house frameworks. The result is a user-facing application used exclusively in-house. It's not a service or a 'for-sale' product.

The Bad
Context: The team knows Java fairly well and writes the majority of the domain code in Java. The team didn't have any previous experience with Clojure.
Result: Happiness with using a high level language was often impacted by no knowledge of that high level language.

Context: The vast majority of the code is written in Java and is able to be manipulated with IntelliJ.
Result: Some members of the team felt that using an additional language hampered their ability to use automated refactoring tools, and thus the rewards were not worth the cost. Other members believe the powerful language features provide benefits that out-weigh the costs.

Context: The tests need to run on developer boxes and build machines. One and a half years ago, there was no easy way to run Clojure tests in JUnit.
Result: The team hacked together something custom. It didn't take long to write, but the integration with IntelliJ and JUnit is not nearly as nice as using pure Java.

The Interesting
Context: The team has plenty of experience with Object Oriented (OO) based, C-style languages. The team didn't have any previous experience with Functional Programming (FP). Tests are procedural. However, the team was much more experienced writing procedural code in OO than FP.
Result: The paradigm shift impacted the speed at which the team was able to learn Clojure, but the team was able to peek into the world of FP without writing an entire application using an FP language.

The Good
Context: The team needed to build several utilities that allowed them to high level test the application.
Result: It was trivial to create a REPL that allowed us to communicate at a high level with our application. We practically got a command line interface to our application for free.

Context: The team was curious if Clojure would be the right language for solving other problems on the project. The team knew that with less than a dozen high level tests, rewriting the tests in another language would not be a large time investment.
Result: The team was able to determine that Clojure is a viable language choice for several tasks without having to take the leap on a mission critical component whose size may be variable.

Context: High level tests often require a significant amount of setup code. Clojure provides language features (duck-typing, macros, etc) that allow you to reduce the noise often generated by setup code.
Result: The tests are easier to read and maintain (assuming you understand what and how things are being hidden) and the amount of utility code required to run high level tests was significantly reduced.

*We still use Java to unit test our Java, for now.

Tuesday, July 06, 2010

Sacrificing some IDE capabilities

When discussing adopting Clojure or JRuby there are often heated discussions concerning the pros and cons of working with a language that has less IDE support. This blog entry will focus on the common concerns I hear.

I've never met anyone who has mastered Ruby or Clojure and had the opinion that they are more productive in Java. I think that's the best "proof" that the pros outweigh the cons.*

Other than that, things get complicated. The problem is, I understand where reluctant adopters are coming from. But, I just don't see any way to convince them to make the leap.

Many Java programmers do a bit of Ruby or Clojure and make a determination. I think their concerns are inflated by 3 factors: api/library fear, shallow understanding of powerful language features, and inability to optimize problem resolution.

A common concern is that learning a new language requires learning new libraries. Even though Java's libraries are available in Clojure or JRuby, both languages still have their own apis/libs to learn. In Java, you often use auto-completion. In Ruby and Clojure some auto-completion is available; however, it's often suspect. Few people use a Java REPL, so they don't understand how helpful/powerful a REPL is. In Clojure and Ruby, the REPL is often the most effective way to learn new apis/tools. Since most Java developers haven't experienced the value of the REPL, they exaggerate the overhead of learning apis/libs.

Writing idiomatic Java using Ruby or Clojure will produce painful and likely unmaintainable code. Writing idiomatic Ruby and Clojure often require understanding the powerful aspects of a language (metaprogramming, method_missing, macros, duck-typing). Understanding those aspects of a language can greatly change the design of a system. The result of mastering language features is often concise and significantly more maintainable code. However, if you aren't able to see the concise/maintainable version, you will picture a design that may likely be hard to manipulate without IDE support. Enter unnecessary concern and doubt.

Lastly, when things go wrong you need to know how to fix them. Fixing issues is often a mix of tweaking the language and mixing in a few tools. In Java it's often Ignore the noise in the stacktrace, Click a few links and open a few files, Drop in some break points or print lines or change a few tests. Getting to the source of the problem efficiently requires an IDE. In Clojure and Ruby I have those tools (since I do my Clojure and Ruby in IntelliJ), but I also have the REPLs. I know what noise to ignore. Even without those tools (when I used TextMate for Ruby), I know what patterns generally represent what issues. I can see the signal in the noise, and I have an additional tool (the REPL) to help me determine what is signal and what is noise.

However, if you don't know what is noise and what is signal, and (once again) you don't understand the value of a REPL then you exaggerate the cost of problem resolution.

Cosmin Stejerean recently mentioned "To me good IDE support for Clojure is very much like using training wheels on a bicycle."

That's almost true. I'll freely admit that I'd love some free refactoring. It would make me more productive. However, I'm already more productive so it's icing on the cake, not a reason to stop me from using the language.

Unfortunately, those training wheels are very much a deal-breaker to conservative adopters.

* I have recently heard of a few projects starting as Python and switching to Java, but I haven't ever done any Python, so I wont speculate as to why that happened.

Monday, April 12, 2010

Clojure: Converting a Custom Collection to a Sequence

I've been doing a bit of clojure lately and I've often found myself looking to convert a custom collection to a sequence. Unfortunately, several custom collections from libraries that we use don't implement Iterable; therefore, I can't simply use the seq function to create a seq.

In Java it's common to simply use a for loop and a get method to iterate through custom collections that don't implement Iterable.

for (int i=0; i<fooCollection.size(); i++) {
  Foo foo = fooCollection.get(i);
}

There are patterns for doing something similar in Clojure; however, I much prefer to work with Sequences and functions that operate on Sequences. Shane Harvie and I added the following snippet to our code to quickly convert a custom collection to a seq.

(defmacro obj->seq [obj count-method]
  `(loop [result# []
          i# 0]
    (if (< i# (. ~obj ~count-method))
      (recur (conj result# (.get ~obj i#)) (inc i#))
      result#)))

Our first version only worked with a custom collection that had a .size() method; however, the second collection we ran into used a .length() method, so we created the above macro that allows you to specify the (size|length|count|whatever) method.

The obj->seq function can be used similar to the following example.

(let [positions (obj->seq position-collection size)]
  ; do stuff with positions
  )

I'm fairly new to Clojure, so please feel free to provide recommendations and feedback.

Thursday, March 11, 2010

Pairing isn't the Solution

In 2007 & 2008 I wrote several blog entires on Pair Programming (tagged with pair programming). Pair programming solved a lot of problems for me: knowledge transfer, mentoring, code review, etc. It also solved another problem at the same time, even though I wasn't aware of it. Pairing helps reduce the number of cooks in the kitchen.

These days I'm working on a project with some really talented people. The pace at which we deliver features is far faster than any project I worked on before I joined DRW. However, I've seen two effects of having so many talented developers on the same team: wiki coding and spork sharing. (both metaphors coined by my tech lead, badams)

Wiki coding occurs when the churn on a class or component is so great that whoever commits last ends up deciding what it should do. Wiki coding leads to inconsistent design and lack of convention. Spork sharing occurs when a fork is designed and a spoon is also needed. Instead of creating a spoon, you want to share the handle, so you create a spork instead. Now, you have no spoon or fork, you have a spork, and sporks suck if you really want a spoon or a fork. Both problems seem to stem from differing vision for the classes or components.

It appears that the more talented programmers you put on a team, the more fractured the vision for the project becomes. Of course, complexity (as well as many other factors) can also increase the fracture.

You won't catch me advocating for 'hard-working average programmers'. What I do believe is: you should stock your team with only rockstars, between 2 and 4 of them. I've worked on teams that only had 3 people. I've worked on teams with about 16. I was a consultant, I've worked on a lot of teams - big and small. My experience was that the smaller teams were much more effective, every time.

Reflecting on the past few years, I came up with the following reasons for keeping your teams small:

New Technology: Small teams can more easily come to a decision to try a new technology. Also, if a small team selects a new technology, the entire team is likely to learn the new technology. However, an individual on a larger team may find a way to never learn the new technology and never contribute to that area of the code. Worse, a larger team may shy away from trying new technology because they cannot come to a consensus. New technology can offer productivity boosts, and missing those boosts can hurt a teams efficiency.
Smaller Problems: Smaller teams generally implies solving smaller problems. Most problems can be broken down into smaller, less complex problems. Once the problem is broken down, it should be easier for the small teams to craft elegant solutions that can integrate to solve the larger business need.
Improved Maintainability: Smaller teams generate less code, which allows all team members to have depth and breadth concerning a solution. Having depth and breadth allows you to easily share vision and fix broken windows.
Ownership: Ownership isn't bad. Single points of failure are bad, but having a small team that feels ownership over an application will generally lead to higher quality and more emotional attachment to it's success.
Context Switching: Smaller codebases maintained by small teams solving a smaller problem will context switch less, because there's nothing else to context switch to.
Responsibility: The Bystander Effect applies to code (e.g. production bugs) as well.
Unified Vision: In a large team it's easy to have an opposing vision and never be proven wrong. In a small team it's likely that you will agree on a vision for the project (process, technology, etc) as an entire team.
Adding: Adding one more person to a 2 person team will likely result in a new productive team member relatively quickly. Smaller team, smaller problem, smaller codebase = easier team to join.

Sometimes I wonder if consultancies sell large teams, and then use pairing to make the large teams more effective. It's definitely my experience that large teams pairing are much more effective than large teams that aren't pairing. However, I wonder if anyone ever stopped to ask: would a smaller team have been sufficient (or, better)?

I still believe pairing is an answer to many problems. However, the best solution to making a 8 person team more effective isn't pairing. I believe that a superior solution is finding a way to solve the problem with a smaller, more effective team (or teams).

Wednesday, February 24, 2010

Shorter Syntax for Creating Stubs with Mockito

As part of my efforts to create more concise Unit Tests in Java, Steve McLarnon and I recently added the ability to create a stub and stub one method with only one line.

The implementation relies entirely on Mockito, so my example assumes you were already using Mockito to create the stub.

The Mockito documentation has the following example for stubbing:
LinkedList mockedList = mock(LinkedList.class);
when(mockedList.get(0)).thenReturn("first");
Using the code Steve and I put together you can simplify this to the following single line.
LinkedList mockedList = create(aStub(LinkedList.class).returning("first").from().get(0));

The implementation is simple, but does rely on static state. There are ways to improve the implementation; however, this works for us. If you have more specific needs, feel free to change it to suit you.

The implementation follows:

public class MockitoExtensions {
    public static  T create(Object methodCall) {
        when(methodCall).thenReturn(StubBuilder.current.returnValue);
        return (T) StubBuilder.current.mockInstance;
    }

    public static  StubBuilder aStub(Class klass) {
        return new StubBuilder(mock(klass));
    }

    public static class StubBuilder {
        public static StubBuilder current;
        public final T mockInstance;
        private Object returnValue;

        public StubBuilder(T mockInstance) {
            current = this;
            this.mockInstance = mockInstance;
        }

        public T from() {
            return mockInstance;
        }

        public StubBuilder returning(Object returnValue) {
            this.returnValue = returnValue;
            return this;
        }
    }
}

The Maintainability of Unit Tests

At speakerconf 2010 discussion repeatedly arose around the idea that unit tests hinder your ability to refactor and add new features. It's true that tests are invaluable when refactoring the internals of a class as long as the interface doesn't change. However, when the interface does change, updating the associated tests is often the vast majority of the effort. Additionally, if a refactoring changes the interaction between two or more classes, the vast majority of the time is spent fixing tests, for several classes.

In my experience, making the interface or interaction change often takes 15-20% of the time, while changing the associated tests take the other 80-85%. When the effort is split that drastically, people begin to ask questions.

Should I write Unit Tests? The answer at speakerconf was: Probably, but I'm interested in hearing other options.

Ayende proposed that scenario based testing was a better solution. His examples drove home the point that he was able to make large architectural refactorings without changing any tests. Unfortunately, his tests suffered from the same problems that Integration Test advocates have been dealing with for years: Long Running Tests (20 mins to run a suite!) and Poor Defect Localization (where did things go wrong?). However, despite these limitations, he's reporting success with this strategy.

In my opinion, Martin Fowler actually answered this question correctly in the original Refactoring book.

The key is to test the areas that you are most worried about going wrong. That way you get the most benefit for your testing effort.

It's a bit of a shame that sentence lives in Refactoring and not in every book written for developers beginning to test their applications. After years of trying to test everything, I stumbled upon that sentence while creating Refactoring: Ruby Edition. That one sentence changed my entire attitude on Unit Testing.

I still write Unit Tests, but I only focus on testing the parts that provide the most business value.

An example

you find yourself working on an insurance application for a company that stores it's policies by customer SSN. Your application is likely to have several validations for customer information.

The validation that ensures a SSN is 9 numeric digits is obviously very important.

The validation that the customer name is alpha-only is probably closer to the category of "nice to have". If the alpha-only name validation is broken or removed, the application will continue to function almost entirely normally. And, the most likely problem is a typo - probably not the end of the world.

It's usually easy enough to add validations, but you don't need to test every single validation. The value of each validation should be used to determine if a test is warranted.

How do I improve the maintainability of my tests? Make them more concise.

Once you've determined you should write a test, take the time to create a concise test that can be maintained. The longer the test, the more likely it is to be ignored or misunderstood by future readers.

There are several methods for creating more concise tests. My recent work is largely in Java, so my examples are Java related. I've previously written about my preferred method for creating objects in Java Unit Tests. You can also use frameworks that focus on simplicity, such as Mockito. But, the most important aspect of creating concise tests is taking a hard look at object modeling. Removing constructor and method arguments is often the easiest way to reduce the amount of noise within a test.

If you're not using Java, the advice is the same: Remove noise from your tests by improving object modeling and using frameworks that promote descriptive, concise syntax. Removing noise from tests always increases maintainability.

That's it? Yes. I find when I only test the important aspects of an application and I focus on removing noise from the tests that I do write, the maintainability issue is largely addressed. As a result the pendulum swings back towards a more even effort split between features & refactoring vs updating tests.