Wednesday, December 28, 2011

Convert java.util.Properties to a Clojure Map

As I previously mentioned, a lot of the work I do involves Clojure & Java interop. This work includes the occasional case of working with a java.util.Properties object from within Clojure. Working with a Properties object isn't a huge deal, but while in Clojure I prefer to use destructuring and the various functions (e.g. update-in, assoc, dissoc, etc) that are designed to work with Clojure maps.

The following example shows how easy it is to convert a Properties object to a Clojure map.
user=> (def prop-obj (doto (java.util.Properties.) (.putAll {"a" 1 "b" 2})))
user=> prop-obj
#<Properties {b=2, a=1}>

user=> (reduce (fn [x [y z]] (assoc x y z)) {} prop-obj)
{"a" 1, "b" 2}
That's fairly easy, but you'll quickly want your keys to be keywords if you plan on destructuring. You can drop in a quick call to keyword to convert the keys; however, you'll probably also want to dasherize the keys to allow for easy destructuring using keys and idiomatic names.
user=> (defn dash-match [[ _ g1 g2]]      
          (str g1 "-" g2))

user=> (defn dasherize [k]
          (-> k
            (clojure.string/replace #"([A-Z]+)([A-Z][a-z])" dash-match)
            (clojure.string/replace #"([a-z\d])([A-Z])" dash-match)
            (clojure.string/lower-case)))#'user/dash-match

user=> (def prop-obj (doto (java.util.Properties.) (.putAll {"FirstName" "Mike" "LastName" "Green"})))
user=> (reduce (fn [x [y z]] (assoc x y z)) {} prop-obj)                                              
{"LastName" "Green", "FirstName" "Mike"}
user=> (reduce (fn [x [y z]] (assoc x (-> y dasherize keyword) z)) {} prop-obj)
{:last-name "Green", :first-name "Mike"}
That looks good, but you might also find yourself working with a properties file that uses dots or dashes to group similar data. For example, you might find the following entry in your properties file (and the resulting Properties object).
person.name=Mike Green
person.age=26
person.sex=male
Loading this into a map works fine; however, it would be nice if the resulting map was nested (to keep common data together, and for easier access using get-in, update-in, etc).

The following code splits on dots and nests the values appropriately.
user=> (def prop-obj (doto (java.util.Properties.) (.putAll {"person.name" "Mike Green" "person.age" "26" "person.sex" "male"})))        
user=> prop-obj                                                                                                                  
#<Properties {person.name=Mike Green, person.age=26, person.sex=male}>
user=> (reduce (fn [x [y z]] (assoc-in x (-> y dasherize (clojure.string/split #"\.")) z)) {} prop-obj)
{"person" {"sex" "male", "age" "26", "name" "Mike Green"}}

user=> (-> (reduce (fn [x [y z]] (assoc-in x (-> y dasherize (clojure.string/split #"\.")) z)) {} prop-obj) clojure.walk/keywordize-keys)
{:person {:sex "male", :age "26", :name "Mike Green"}}
So, not too complicated, but much more complicated than what we started with. It turns out you'll likely want to do other things like parse integers, parse booleans, require keys, and add defaults when a key=value pair isn't specified.

We could go through the effort here of providing all that functionality, but instead I've created a small library that provides all of the above mentioned features: propertea

If you'd like to see the implementation of the above features, you'll just need to look in propertea.core. If you want examples of all of the features of propertea, checkout the tests in github.

Clojure & Java Interop

About a year ago I got a phone call asking if I wanted to join another team at DRW. The team supports a (primarily) Java application, but the performance requirements would also allow it to be written in a higher level language. I'd been writing Clojure (basically) full-time at that point - so my response was simple: I'd love to join, but I'm going to want to do future development using Clojure.

A year later we still have plenty of Java, but the vast majority of the new code I add is Clojure. One of the big reasons I'm able to use Clojure so freely is the seamless interop with Java.

Execute Clojure from Java
Calling Clojure from Java is as simple as loading the .clj file and invoking a method from that file. I used the same example years ago, but I'll inline it here for simplicity.
; interop/core.clj
(ns interop.core)

(defn print-string [arg]
  (println arg))

// Java calling code
RT.loadResourceScript("interop/core.clj");
RT.var("interop.core", "print-string").invoke("hello world");
note: examples from this blog entry are available in this git repo. The commit with the code from the previous example is available here and I'm running the example from the command line with:
lein jar && java -cp "interop-1.0.0.jar:lib/*" interop.Example
Execute Java from Clojure
At this point we have Java executing some Clojure code, and we also have Clojure using an object that was created in Java. Even though we're in Clojure we can easily call methods on any Java object.
(ns interop.core)

(defn print-string [arg]
  (println arg "is" (.length arg) "characters long"))
commit

The above code (using the length method of a String instance) produces the following output.
hello world is 11 characters long
Calling a Java method and passing in additional arguments is also easy in Clojure.
(ns interop.core)

(defn print-string [arg]
  (println (.replace arg "hello" "goodbye")))
commit

The above code produces the following output.
goodbye world
There are a few other things to know about calling Java from Clojure. The following examples show how to call static methods, use enums, and use inner classes.
(ns interop.core)

(defn print-string [arg]
  ;;; calling a static method
  (println (String/valueOf true))

  ;;; using an enum
  (println (java.util.concurrent.TimeUnit/SECONDS))

  ;;; using a Java nested (inner) class. Note, in Clojure you
  ;;; use a $ instead of a .
  (println (java.util.AbstractMap$SimpleEntry. "key" "val")))
commit

And, the output:
true
#< SECONDS>
#<SimpleEntry key=val>
Create Java objects in Clojure
When working with Clojure you'll likely want to interact with existing Java objects, but you'll probably also want to create new instances of Java objects. You might have noticed the dot at the end of Abstract$SimpleEntry. in the previous example - that's how you instruct Clojure to create an instance of a Java object. The following example shows the dot notation for calling a constructor of the String class.
(ns interop.core)

(defn print-string [arg]
  (println (String. arg)))
commit

At this point our output is back to the original output.
hello world
When creating Java objects it's often beneficial to know which Java interfaces the Clojure data structures implement. The following examples demonstrate how you can create Java objects while passing Clojure datastructures (and functions) as constructor arguments.
(ns interop.core)

(defn print-string [arg]
  ;;; pass a Clojure vector where Java expects a java.util.Collection
  (println (java.util.HashSet. ["1" "2"]))

  ;;; pass a Clojure map where Java expects a java.util.Map
  (println (java.util.LinkedHashMap. {1 "1" 2 "2"}))

  ;;; pass a Clojure function where Java expects a Runnable
  (println (Thread. (fn [] (println "clojure fns are runnables (and callables)")))))
commit

The output shows the constructed Java objects.
#<HashSet [2, 1]>
#<LinkedHashMap {1=1, 2=2}>
#<Thread Thread[Thread-1,5,main]>
Calling constructors in Clojure is very easy, but that's not always an option when creating a Java object. At times you will likely need to create an instance of a Java interface. Clojure provides both proxy and reify for creating instances of Java interfaces. The following example demonstrates the syntax for using either proxy or reify.
(ns interop.core)

(defn proxy-coll []
  (proxy [java.util.Collection] []
    (add [o]
         (println o)
         true)))

(defn reify-coll []
  (reify java.util.Collection
    (add [this o]
         (println o)
         (println this)
         true)))

(defn main []
  (.add (proxy-coll) "this string is printed on proxied.add")
  (.add (reify-coll) "this string is printed on reified.add"))
commit

note, I also changed Example.java (the details are available in the above linked commit). The syntax for proxy and reify are fairly similar, and both offer additional options that are worth looking into. The primary differences between these two simple examples are:
  • The proxy implementation requires an empty vector where we could specify constructor arguments (if this were an abstract class instead of an interface).
  • The arg list for all methods of reify will specify the reified instance as the first argument. In our example the Collection.add method only takes one argument, but in our reify we also get the instance of the collection.
You might have also noticed that both implementations of add have "true" at the end - in our example we're hard-coding the return value of add to always return true. The following output is the result of running the current example code.
this string is printed on proxied.add
this string is printed on reified.add
#<core$reify_coll$reify__11 interop.core$reify_coll$reify__11@556917ee>
It's worth reading the docs to determine whether you want proxy or reify; however, if you don't see a clear choice I would opt for reify.

Returning objects from Clojure to Java
Our current Example.java returns something from the call to invoke on the clojure.lang.Var that is returned from RT.var("interop.core", "main"), but we're ignoring it so we have no idea what's returned.* Let's change the code and return something on purpose.
// interop/Example.java
package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        RT.loadResourceScript("interop/core.clj");
        System.out.println(RT.var("interop.core", "main").invoke());
    }
}

; interop/core.clj
(ns interop.core)

(defn main []
  {:a "1" :b "2"})
Running our changes produces the following output.
{:a "1", :b "2"}
commit

At this point we are back in Java land after making a quick trip to Clojure to get a value. Returning most objects will be pretty straightforward; however, at some point you may want to return a Clojure function. This turns out to be fairly easy as well, since Clojure functions are instances of the IFn interface. The following code demonstrates how to return a Clojure function and call it from within Java.
// interop/Example.java
package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        RT.loadResourceScript("interop/core.clj");
        clojure.lang.IFn f = (clojure.lang.IFn) RT.var("interop.core", "main").invoke();
        f.invoke("hello world");
    }
}

// interop/core.clj
(ns interop.core)

(defn main [] println)
commit

The above example returns the println function from interop.core/main and then invokes the println function from within Java. I only chose to pass one argument to invoke; however, the IFn.invoke method has various overrides to allow you to pass several arguments. The above code works, but it can be simplified to the following example.
package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        clojure.lang.IFn f = (clojure.lang.IFn) RT.var("clojure.core", "println");
        f.invoke("hello world");
    }
}
commit

It seems like a fitting end that our final output is the same as our original output.
hello world
*actually, it's the last thing that's returned, or "true" for this specific case.

Saturday, November 19, 2011

Clojure: expectations - scenarios

expectations.scenarios are now deprecated: http://blog.jayfields.com/2012/11/clojure-deprecating-expectationsscenari.html


When I set out to write expectations I wanted to create a simple unit testing framework. I'm happy with what expectations provides for unit testing; however, I also need to write the occasional test that changes values or causes a side effect. There's no way I could go back to clojure.test after enjoying better failure messages, trimmed stack traces, automatic testing running, etc. Thus, expectations.scenarios was born.

Using expectations.scenarios should be fairly natural if you already use expectations. The following example is a simple scenario (which could be a unit test, but we'll start here for simplicity).
(ns example.scenarios
  (:use expectations.scenarios))

(scenario
 (expect nil? nil))
A quick trip to the command line shows us that everything is working as expected.
Ran 1 tests containing 1 assertions in 4 msecs
0 failures, 0 errors.
As I said above, you could write this test as a unit test. However, expectations.scenarios was created for the cases in which you want to verify a value, make a change, and verify a value again. The following example shows multiple expectations verifying changing values in the same scenario.
(scenario
  (let [a (atom 0)]
    (swap! a inc)
    (expect 1 @a)
    (swap! a inc)
    (expect 2 @a)))
In expectations (unit tests) you can only have one expect (or given) so failures are captured, but they do not stop execution. However, due to the procedural nature of scenarios, the first failing expect stops execution.
(scenario
  (let [a (atom 0)]
    (swap! a inc)
    (expect 2 @a)
    (println "you'll never see this")))

failure in (scenarios.clj:4) : example.scenarios
           (expect 2 (clojure.core/deref a))
           expected: 2 
                was: 1
           on (scenarios.clj:7)
Ran 1 tests containing 1 assertions in 81 msecs
1 failures, 0 errors.
expectations.scenarios also allows you to easily verify calls to any function. I generally use interaction expects when I need to verify some type of side effect (e.g. logging or message publishing).
(scenario
 (println "1")
 (expect (interaction (println "1"))))
It's important to note the ordering of this scenario. You don't 'setup' an expectation and then call the function. Exactly the opposite is true - you call the function the same way you would in production, then you expect the interaction to have occurred. You may find this jarring if you're used to setting up your mocks ahead of time; However, I think this syntax is the least intrusive - and I think you'll prefer it in the long term.

The above example calls println directly, but your tests are much more likely to look something like this.
(defn foo [x] (println x))

(scenario
 (foo "1")
 (expect (interaction (println "1"))))
Similar to all other mocking frameworks (that I know of) the expect is using an implicit "once" argument. You can also specify :twice and :never if you find yourself needing those interaction tests.
(defn foo [x] (println x))

(scenario
 (foo "1")
 (foo "1")
 (expect (interaction (println "1")) :twice)
 (expect (interaction (identity 1)) :never))
On occasion you may find yourself interested in verifying 2 out of 3 arguments - expectations.scenarios provides the 'anything' var that can be used for arguments you don't care about.
(defn foo [x y z] (println x y z))

(scenario
 (foo "1" 2 :a)
 (expect (interaction (println "1" anything :a))))
That's about all there is to expectations.scenarios, hopefully it fills the gap for tests you want to write that simply can't be done as unit tests.

Tuesday, November 01, 2011

Clojure: expectations unit testing wrap-up

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six (this entry)

The previous blog posts on expectations unit testing syntax cover all of the various ways that expectations can be used to write tests and what you can expect when your tests fail. However, there are a few other things worth knowing about expectations.

Stacktraces
expectations aggressively removes lines from the stacktraces. Just like many other aspects of expectations, the focus is on more signal and less noise. Any line in the stacktrace from clojure.core, clojure.lang, clojure.main, and java.lang will be removed. As a result any line appearing in your stacktrace should be relevant to your application or a third-party lib you're using. expectations also removes any duplicates that can occasionally appear when anonymous functions are part of the stacktrace. Again, it's all about improving signal by removing noise. Speaking of noise...

Test Names
You might have noticed that expectations does not require you to create a test name. This is a reflection of my personal opinion that test names are nothing more than comments and shouldn't be required. If you desire test names, feel free to drop a comment above each test. Truthfully, this is probably a better solution anyway, since you can use spaces (instead of dashes) to separate words in a comment. Comments are good when used properly, but they can become noise when they are required. The decision to simply use comments for test names is another example of improving signal by removing noise.

Running Focused Expectations
Sometimes you'll have a file full of expectations, but you only want to run a specific expectation - expectations solves this problem by giving you 'expect-focused'. If you use expect-focused only expectations that are defined using expect-focused will be run.

For example, if you have the following expectations in a file you should see the following results from 'lein expectations'.
(ns sample.test.core
(:use [expectations]))

(expect zero? 0)
(expect zero? 1)
(expect-focused nil? nil)

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 2 msecs
IGNORED 2 EXPECTATIONS
0 failures, 0 errors.
As you can see, expectations only ran one test - the expect-focused on line 6. If the other tests had been run the test on line 5 would have created a failure. It can be easy to accidentally leave a few expect-focused calls in, so expectations prints the number of ignored expectations in capital letters as a reminder. Focused expectation running is yet another way to remove noise while working through a problem.

Tests Running
If you always use 'lein expectations' to run your tests you'll never even care; however, if you ever want to run individual test files it's important to know that your tests run by default on JVM shutdown. When I'm working with Clojure and Java I usually end up using IntelliJ, and therefore have the ability to easily run individual files. When I switched from clojure.test to expectations I wanted to make test running as simple as possible - so I removed the need to specify (run-all-tests). Of course, if you don't want expectations to run for some reason you can disable this feature by calling (expectations/disable-run-on-shutdown).

JUnit Integration
Lack of JUnit integration was a deal breaker for my team in the early days, so expectations comes with an easy way to run all tests as part of JUnit. If you want all of your tests to run in JUnit all you need to do is implement ExpectationsTestRunner.TestSource. The following example is what I use to run all the tests in expectations with JUnit.
import expectations.junit.ExpectationsTestRunner;
import org.junit.runner.RunWith;

@RunWith(expectations.junit.ExpectationsTestRunner.class)
public class SuccessTest implements ExpectationsTestRunner.TestSource{

public String testPath() {
return "test/clojure/success";
}
}
As you can see from the example above, all you need to do is tell the test runner where to find your Clojure files.

That should be everything you need to know about expectations for unit testing use. If anything is unclear, please drop me a line in the comments.

Clojure: expectations - removing duplication with given

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five (this entry)
Clojure Unit Testing with Expectations Part Six

expectations obviously has a bias towards one assertion per test; however, there are times that verifying several things at the same time does make sense. For example, if you want to verify a few different properties of the same Java object it probably makes sense to make multiple assertions on the same instance.

One of the biggest problems with multiple assertions per test is when your test follows this pattern:
  1. create some state
  2. verify a bit about the state
  3. alter the state
  4. verify more about the state
Part of the problem is that the assertions that occurred before the altering of the state may or may not be relevant after the alteration. Additionally, if any of the assertions fail you have to stop running the entire test - thus some of your assertions will not be run (and you'll be lacking some information).

expectations takes an alternate route - embracing the idea of multiple assertions by providing a specific syntax that allows multiple verifications and the least amount of duplication.

The following example shows how you can test multiple properties of a Java object using the 'given' syntax.
(given (java.util.ArrayList.)
(expect
.size 0
.isEmpty true))

jfields$ lein expectations
Ran 2 tests containing 2 assertions in 4 msecs
0 failures, 0 errors.
The syntax is simple enough: (given an-object (expect method return-value [method return-value]))
note: [method return-value] may be repeated any number of times.

This syntax allows us to expect return-values from as many methods as we care to verify, but encourages us not to change any state between our various assertions. This syntax also allows us to to run each assertion regardless of the outcome of any previous assertion.

Obviously you could call methods that change the internal state of the object and at that point you're on your own. I definitely wouldn't recommend testing that way. However, as long as you call methods that don't change any state 'given' can help you write succinct tests that verify as many aspects of an object as you need to test.

As usual, I'll show the output for tests that fail using this syntax.
(given (java.util.ArrayList.)
(expect
.size 1
.isEmpty false))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
(expect 1 (.size (java.util.ArrayList.)))
expected: 1
was: 0
failure in (core.clj:4) : sample.test.core
(expect false (.isEmpty (java.util.ArrayList.)))
expected: false
was: true
This specific syntax was created for testing Java objects, but an interesting side effect is that it actually works on any value and you can substitute method calls with any function. For example, you can test a vector or a map using the examples below as a template.
(given [1 2 3]
(expect
first 1
last 3))

(given {:a 2 :b 4}
(expect
:a 2
:b 4))

jfields$ lein expectations
Ran 4 tests containing 4 assertions in 8 msecs
0 failures, 0 errors.
And, of course, the failures.
(given [1 2 3]
(expect
first 2
last 1))

(given {:a 2 :b 4}
(expect
:a 1
:b 1))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
(expect 2 (first [1 2 3]))
expected: 2
was: 1
failure in (core.clj:4) : sample.test.core
(expect 1 (last [1 2 3]))
expected: 1
was: 3
failure in (core.clj:9) : sample.test.core
(expect 1 (:a {:a 2, :b 4}))
expected: 1
was: 2
failure in (core.clj:9) : sample.test.core
(expect 1 (:b {:a 2, :b 4}))
expected: 1
was: 4
Ran 4 tests containing 4 assertions in 14 msecs
4 failures, 0 errors.
When you want to call methods on a Java object or call functions with the same instance over and over the previous given syntax is really the simplest solution. However, there are times where you want something more flexible.

expectations also has a 'given' syntax that allows you to specify a template - thus reducing code duplication. The following example shows a test that verifies + with various arguments.
(given [x y] (expect 10 (+ x y))
4 6
6 4
12 -2)

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 5 msecs
0 failures, 0 errors.
The syntax for this flavor of given is: (given bindings template-form values-to-be-bound). The template form can be anything you need - just remember to put the expect in there.

Here's another example where we combine given with in to test a few different things. This example shows both the successful and failing versions.
;; successful
(given [x y] (expect x (in y))
:a #{:a :b}
{:a :b} {:a :b :c :d})

;; failure
(given [x y] (expect x (in y))
:c #{:a :b}
{:a :d} {:a :b :c :d})

lein expectations
failure in (core.clj:8) : sample.test.core
(expect :c (in #{:a :b}))
key :c not found in #{:a :b}
failure in (core.clj:8) : sample.test.core
(expect {:a :d} (in {:a :b, :c :d}))
expected: {:a :d}
in: {:a :b, :c :d}
:a expected: :d
was: :b
Ran 4 tests containing 4 assertions in 13 msecs
2 failures, 0 errors.
That's basically it for 'given' syntax within expectations. There are times that I use all of the various versions of given; however, there seems to be a connection with using given and interacting with Java objects. If you don't find yourself using Java objects very often then you probably wont have a strong need for given.

Clojure: expectations and Double/NaN

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four (this entry)
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

I'm not really a fan of Double/NaN in general, but sometimes it seems like the least evil choice. When I find myself in one of those cases I always hate having to write tests in a way that differs from all the other tests in the codebase. A goal I've always had with expectations is to keep the syntax consistent, and as a result I've chosen to treat Double/NaN as equal to Double/NaN when in the various Clojure data structures.

The following examples demonstrate Double/NaN being treated as equal

;; allow Double/NaN equality in a map
(expect {:a Double/NaN :b {:c Double/NaN}} {:a Double/NaN :b {:c Double/NaN}})

;; allow Double/NaN equality in a set
(expect #{1 Double/NaN} #{1 Double/NaN})

;; allow Double/NaN equality in a list
(expect [1 Double/NaN] [1 Double/NaN])

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 32 msecs
0 failures, 0 errors.
As you would expect, you can also count on Double/NaN being considered equal even if you are using the 'in' function.
;; allow Double/NaN equality when verifying values are in a map
(expect {:a Double/NaN :b {:c Double/NaN}} (in {:a Double/NaN :b {:c Double/NaN} :d "other stuff"}))

;; allow Double/NaN equality when verifying it is in a set
(expect Double/NaN (in #{1 Double/NaN}))

;; allow Double/NaN equality when verifying it's existence in a list
(expect Double/NaN (in [1 Double/NaN]))

jfields$ lein expectations
Ran 3 tests containing 3 assertions in 32 msecs
0 failures, 0 errors.
For completeness I'll also show the examples of each of these examples failing.
;; allow Double/NaN equality in a map
(expect {:a Double/NaN :b {:c Double/NaN}} {:a nil :b {:c Double/NaN}})

;; allow Double/NaN equality with in fn and map
(expect {:a Double/NaN :b {:c nil}} (in {:a Double/NaN :b {:c Double/NaN} :d "other stuff"}))

;; allow Double/NaN equality in a set
(expect #{1 Double/NaN} #{1 nil})

;; allow Double/NaN equality with in fn and set
(expect Double/NaN (in #{1 nil}))

;; allow Double/NaN equality in a list
(expect [1 Double/NaN] [1 nil])

;; allow Double/NaN equality with in fn and list
(expect Double/NaN (in [1 nil]))


jfields$ lein expectations
failure in (core.clj:5) : sample.test.core
(expect {:a Double/NaN, :b {:c Double/NaN}}
{:a nil, :b {:c Double/NaN}})
expected: {:a NaN, :b {:c NaN}}
was: {:a nil, :b {:c NaN}}
:a expected: NaN
was: nil
failure in (core.clj:8) : sample.test.core
(expect {:a Double/NaN, :b {:c nil}} (in {:a Double/NaN, :b {:c Double/NaN}, :d "other stuff"}))
expected: {:a NaN, :b {:c nil}}
in: {:a NaN, :b {:c NaN}, :d "other stuff"}
:b {:c expected: nil
was: NaN
failure in (core.clj:11) : sample.test.core
(expect #{1 Double/NaN} #{nil 1})
expected: #{NaN 1}
was: #{nil 1}
nil are in actual, but not in expected
NaN are in expected, but not in actual
failure in (core.clj:14) : sample.test.core
(expect Double/NaN (in #{nil 1}))
key NaN not found in #{nil 1}
failure in (core.clj:17) : sample.test.core
(expect [1 Double/NaN] [1 nil])
expected: [1 NaN]
was: [1 nil]
nil are in actual, but not in expected
NaN are in expected, but not in actual
failure in (core.clj:20) : sample.test.core
(expect Double/NaN (in [1 nil]))
value NaN not found in [1 nil]
Ran 6 tests containing 6 assertions in 66 msecs
6 failures, 0 errors.
There always seems to be downsides to using NaN, so I tend to look for the least painful path. Hopefully expectations provides the most pain-free path when your tests end up needing to include NaN.

Clojure: expectations with values in vectors, sets, and maps

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three (this entry)
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

I've previously written about verifying equality and the various non-equality expectations that are available. This entry will focus on another type of comparison that is allowed in expectations - verifying that an 'expected' value is in an 'actual' value.

A quick recap - expectations generally look like this: (expect expected actual)

verifying an expected value is in an actual value is straightforward and hopefully not a surprising syntax: (expect expected (in actual))

If that's not clear, these examples should make the concept completely clear.
;; expect a k/v pair in a map.
(expect {:foo 1} (in {:foo 1 :cat 4}))

;; expect a key in a set
(expect :foo (in #{:foo :bar}))

;; expect a val in a list
(expect :foo (in [:foo :bar]))
As you would expect, running these expectations results in 3 passing tests.
jfields$ lein expectations
Ran 3 tests containing 3 assertions in 8 msecs
0 failures, 0 errors.
As usual, I'll show the failures as well.
;; expect a k/v pair in a map.
(expect {:foo 2} (in {:foo 1 :cat 4}))

;; expect a key in a set
(expect :baz (in #{:foo :bar}))

;; expect a val in a list
(expect :baz (in [:foo :bar]))

jfields$ lein expectations
failure in (core.clj:18) : sample.test.core
(expect {:foo 2} (in {:foo 1, :cat 4}))
expected: {:foo 2}
in: {:foo 1, :cat 4}
:foo expected: 2
was: 1
failure in (core.clj:21) : sample.test.core
(expect :baz (in #{:foo :bar}))
key :baz not found in #{:foo :bar}
failure in (core.clj:24) : sample.test.core
(expect :baz (in [:foo :bar]))
value :baz not found in [:foo :bar]
expectations does it's best to provide you with any additional info that might be helpful. In the case of the vector and the set there's not much else that can be said; however, the map failure gives you additional information that can be used to track down the issue.

There's nothing magical going on with 'in' expectations and you could easily do the equivalent with select-keys, contains?, or some, but expectations allows you to get that behavior while keeping your tests succinct.

Clojure: Non-equality expectations

Clojure Unit Testing with Expectations Part One
Clojure Unit Testing with Expectations Part Two (this entry)
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

In my last blog post I gave examples of how to use expectations to test for equality. This entry will focus on non-equality expectations that are also available.

Regex
expectations allows you to specify that you expect a regex, and if the string matches that regex the expectation passes. The following example shows both the successful and failing expectations that use regexes.

(expect #"in 14" "in 1400 and 92")

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 4 msecs
0 failures, 0 errors.

(expect #"in 14" "in 1300 and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
(expect in 14 in 1300 and 92)
regex #"in 14" not found in "in 1300 and 92"
Ran 1 tests containing 1 assertions in 5 msecs
1 failures, 0 errors.
As you can see from the previous example, writing an expectation using a regex is syntactically the same as writing an equality expectation - and this is true for all of the non-equality expectations. In expectations there is only one syntax for expect - it's always (expect expected actual).

Testing for a certain type
I basically never write tests that verify the result of a function is a certain type. However, for the once in a blue moon case where that's what I need, expectations allows me to verify that the result of a function call is a certain type simply by using that type as the expected value. The example below shows the successful and failing examples of testing that the actual is an instance of the expected type.
(expect String "in 1300 and 92")

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 6 msecs
0 failures, 0 errors.

(expect Integer "in 1300 and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
(expect Integer in 1300 and 92)
in 1300 and 92 is not an instance of class java.lang.Integer
Ran 1 tests containing 1 assertions in 5 msecs
1 failures, 0 errors.
Expected Exceptions
Expected exceptions are another test that I rarely write; however, when I find myself in need - expectations has me covered.
(expect ArithmeticException (/ 12 0))

jfields$ lein expectations
Ran 1 tests containing 1 assertions in 6 msecs
0 failures, 0 errors.

(expect ClassCastException (/12 0))

jfields$ lein expectations
failure in (core.clj:19) : sample.test.core
(expect ClassCastException (/ 12 0))
(/ 12 0) did not throw ClassCastException
Ran 1 tests containing 1 assertions in 4 msecs
1 failures, 0 errors.
There's another non-equality expectation that I do use fairly often - an expectation where the 'expected' value is a function. The following simple examples demonstrate that if you pass a function as the first argument to expect it will be called with the 'actual' value and it will pass or fail based on what the function returns. (truthy results pass, falsey results fail).
(expect nil? nil)
(expect true? true)
(expect false? true)

jfields$ lein expectations
failure in (core.clj:19) : sample.test.core
(expect false? true)
true is not false?
Ran 3 tests containing 3 assertions in 4 msecs
1 failures, 0 errors.
These are the majority of the non-equality expectations; however, there is one remaining non-equality expectation - in. Using 'in' is fairly straightforward, but since it has examples for vectors, sets, and maps I felt it deserved it's own blog post - coming soon.

Clojure: expectations Introduction

Clojure Unit Testing with Expectations Part One (this entry)
Clojure Unit Testing with Expectations Part Two
Clojure Unit Testing with Expectations Part Three
Clojure Unit Testing with Expectations Part Four
Clojure Unit Testing with Expectations Part Five
Clojure Unit Testing with Expectations Part Six

A bit of history
Over a year ago I blogged that I'd written a testing framework for Clojure - expectations. I wrote expectations to test my production code, but made it open source in case anyone else wanted to give it a shot. I've put zero effort into advertising expectations; however, I've been quietly adding features and expanding it's use on my own projects. At this point it's been stable for quite awhile, and I think it's worth looking at if you're currently using clojure.test.

Getting expectations
Setting up expectations is easy if you use lein. In your project you'll want to add:
:dev-dependencies [[lein-expectations "0.0.1"]
[expectations "1.1.0"]]
After adding both dependencies you can do a "lein deps" and then do a "lein expectations" and you should see the following output.
Ran 0 tests containing 0 assertions in 0 msecs
0 failures, 0 errors.
At this point, you're ready to start writing your tests using expectations.

Unit Testing using expectations
expectations is built with the idea that unit tests should contain one assertion per test. A result of this design choice is that expectations has very minimal syntax.

For example, if you want to verify the result of a function call, all you need to do is specify what return value you expect from the function call.
(expect 2 (+ 1 1)
note: you'll want to (:use expectations); however, no other setup is required for using expectations. I created a sample project for this blog post and the entire test file looks like this (at this point):
(ns sample.test.core
(:use [expectations]))

(expect 2 (+ 1 1))
Again, we use lein to run our expectations.
jfields$ lein expectations
Ran 1 tests containing 1 assertions in 2 msecs
0 failures, 0 errors.
That's the simplest, and most often used expectation - an equality comparison. The equality comparison works across all Clojure types - vectors, sets, maps, etc and any Java instances that return true when given to Clojure's = function.
(ns sample.test.core
(:use [expectations]))

(expect 2 (+ 1 1))
(expect [1 2] (conj [] 1 2))
(expect #{1 2} (conj #{} 1 2))
(expect {1 2} (assoc {} 1 2))
Running the previous expectations produces similar output as before.
jfields$ lein expectations
Ran 4 tests containing 4 assertions in 26 msecs
0 failures, 0 errors.
Successful equality comparison isn't very exciting; however, expectations really begins to prove it's worth with it's failure messages. When comparing two numbers there's not much additional information that expectations can provide. Therefore, the following output is what you would expect when your expectation fails.
(expect 2 (+ 1 3))

jfields$ lein expectations
failure in (core.clj:4) : sample.test.core
(expect 2 (+ 1 3))
expected: 2
was: 4
expectations gives you the namespace, file name, and line number along with the expectation you specified, the expected value, and the actual value. Again, nothing surprising. However, when you compare vectors, sets, and maps expectations does a bit of additional work to give you clues on what the problem might be.

The following 3 expectations using vectors will all fail, and expectations provides detailed information on what exactly failed.
(expect [1 2] (conj [] 1))
(expect [1 2] (conj [] 2 1))
(expect [1 2] (conj [] 1 3))

jfields$ lein expectations
failure in (core.clj:5) : sample.test.core
(expect [1 2] (conj [] 1))
expected: [1 2]
was: [1]
2 are in expected, but not in actual
expected is larger than actual
failure in (core.clj:6) : sample.test.core
(expect [1 2] (conj [] 2 1))
expected: [1 2]
was: [2 1]
lists appears to contain the same items with different ordering
failure in (core.clj:7) : sample.test.core
(expect [1 2] (conj [] 1 3))
expected: [1 2]
was: [1 3]
3 are in actual, but not in expected
2 are in expected, but not in actual
Ran 3 tests containing 3 assertions in 22 msecs
3 failures, 0 errors.
In these simple examples it's easy to see what the issue is; however, when working with larger lists expectations can save you a lot of time by telling you which specific elements in the list are causing the equality to fail.

Failure reporting on sets looks very similar:
(expect #{1 2} (conj #{} 1))
(expect #{1 2} (conj #{} 1 3))

jfields$ lein expectations
failure in (core.clj:9) : sample.test.core
(expect #{1 2} (conj #{} 1))
expected: #{1 2}
was: #{1}
2 are in expected, but not in actual
failure in (core.clj:10) : sample.test.core
(expect #{1 2} (conj #{} 1 3))
expected: #{1 2}
was: #{1 3}
3 are in actual, but not in expected
2 are in expected, but not in actual
Ran 2 tests containing 2 assertions in 15 msecs
2 failures, 0 errors.
expectations does this type of detailed failure reporting for maps as well, and this might be one if the biggest advantages expectations has over clojure.test - especially when dealing with nested maps.
(expect {:one 1 :many {:two 2}}
(assoc {} :one 2 :many {:three 3}))

jfields$ lein expectations
failure in (core.clj:13) : sample.test.core
(expect {:one 1, :many {:two 2}} (assoc {} :one 2 :many {:three 3}))
expected: {:one 1, :many {:two 2}}
was: {:many {:three 3}, :one 2}
:many {:three with val 3 is in actual, but not in expected
:many {:two with val 2 is in expected, but not in actual
:one expected: 1
was: 2
Ran 1 tests containing 1 assertions in 19 msecs
1 failures, 0 errors.
expectations also provides a bit of additional help when comparing the equality of strings.
(expect "in 1400 and 92" "in 14OO and 92")

jfields$ lein expectations
failure in (core.clj:17) : sample.test.core
(expect in 1400 and 92 in 14OO and 92)
expected: "in 1400 and 92"
was: "in 14OO and 92"
matches: "in 14"
diverges: "00 and 92"
&: "OO and 92"
Ran 1 tests containing 1 assertions in 8 msecs
1 failures, 0 errors.
That's basically all you'll need to know for using expectations to equality test. I'll be following up this blog post with more examples of using expectations with regexs, expected exceptions and type checking; however, if you don't want to wait you can take a quick look at the success tests that are found within the framework.

Monday, September 19, 2011

Recent Thoughts On Hiring and Being Hired

The job market is insane right now. It got to the point that I was receiving job-related email so often that I changed my LinkedIn profile to say I lived in Jacksonville, Florida (I don't - I'm still happily in NYC). However, I do read every job related email that makes it through Google's spam filter, and a few things did catch my eye recently.

Recruiters doing it right:
One recruiter emailed me and didn't ask for a resume, but did ask to see my github account. Setting aside the fact that plenty of code lives outside of github, this request impressed me. If they actually have people doing a bit of research on applicants via their open-source contributions, I imagine they're much better at hiring than their competition.

A different recruiter asked if I was available or if I knew anyone I'd be interested in referring, but the conversation was unique because the recruiter offered a $5,000 referral fee if a friend of mine was hired. Occasionally I will pass a recruiter's contact information along if the job sounds interesting, but I never spend more than 2 minutes thinking about who I know and if they are a match. Obviously, when 5K is the reward, I spent significantly more time considering who I knew with appropriate skills and desires.

Recruiters doing it wrong:
Don't bother telling me that you're offering a $500 referral fee. It's not that $500 is insignificant, but referring a friend is such an unlikely event that the payoff needs to be much higher due to how infrequently things come together. Also, if your competition is paying 5K and you're paying $500, that can give a false impression on what you'll provide as salary. I'm guessing some cool companies only offer $500, and that's cool - but, I think you'll be better off focusing on where you are superior to your competition and omitting where you're definitely behind.

Programmers doing it wrong:
If you're out of work right now there should be a significant reason why. Perhaps you are tied to a small town or a specific domain - those are valid reasons. However, if you find yourself without the skills you need to get the jobs that are available in your area, you've likely been neglecting your craft. This summer, a family member asked me if I could tell his son my secret to success. The best advice I could come up with was: Hit the job boards and see what the largest technology need is and get to work learning it.

Once upon a time the technology I knew the best was Pegasystems. I was laid off and I ended up spending plenty of time on monster.com and the various IT specific clones. There were jobs available, but everyone wanted someone with a Microsoft background. I ended up turning down some well paying jobs working with dead-end technologies and took a decent paying job that allowed me to use .Net. Twelve months later I accepted a job making 125% of the decent salary - which was the most I'd ever made. If you are passionate and have high demand skills you should always be paid well.

Another thing that surprises me is programmers who are "too busy to read about new technology or attend conferences". Perhaps I value innovation more than other programmers, but I simply can't understand this perspective. I see my job as more than whatever feature I'm currently working on. To me, my job is to provide the most features in the shortest amount of time possible. On occasion that means needing to get something out as soon as possible. However, my general working mode is annual production, and in that timeframe inefficiency adds up. On occasion innovation can offer productivity boosts that I couldn't match if I worked 24 hours a day using more dated solutions.

As an employee, I feel it's my responsibility to the company to ensure I'm not ignoring any innovation that could drastically impact my delivery speed. As such, I need to be on top of as many relevant innovations as possible. Interestingly, a side-effect of this attitude is that I should also have the necessary skills to find another job should I find myself looking around. This is an everybody wins situation, but only if you choose to do the right thing.

That's all I have off the top of my head. If you have any interesting experiences with hiring or being hired, please leave me a comment.

Tuesday, August 30, 2011

Life After Pair Programming

When I first joined DRW I noticed that the vast majority of developers were not pair-programming. I was surprised that a company that employed so many smart people would just simply disregard a practice that I considered to be so obviously beneficial.
note: it's so easy to judge people who don't pair-program, isn't it? You can write-off their behavior for so many reasons:
  • They simply haven't done it enough to see how much value it provides.
  • They've used the wrong setup, and turned a positive ROI into a negative ROI situation in the past.
  • They must be anti-social or not team players.
  • They must guard their code because they are too proud or believe it's job security.
Never once did I consider that their behavior might be correct based on their context. There was no context in which pair-programming wasn't the right choice, right?
So, I did what any responsible employee who wants to help would do: I gave a presentation. I took pictures of a traditional (read: broken) pairing station setup - the setup where one employee looked over the shoulder of the person doing all the work. Then I took a picture of a dual monitor, dual keyboard, and dual mouse setup. I made the distinction between pairing and watching someone work. I explained how ping-pong-pair-programming worked. I think I said the right things, and the presentation was well received. Management was on board and offered to setup pairing stations for any developers in the company that wanted to give it a try.

A few people embraced it, but most went back to their traditional habits. I did manage to convert the guys on my team, and we paired the vast majority of the time.

Almost 3 years later, and on a different team, I basically never pair. In fact, when my lone teammate (and boss) asks if we should pair on something my initial reaction is always 'no'. The 2 of us are the only people working on the applications we support, and when we pair I often find myself very distracted and not really participating. Sometimes we ping-pong-pair and it keeps us both involved, but I don't feel like we're going faster than if I'd been working solo. Don't get me wrong, we sit directly next to each-other, talk constantly, and even look at code together at times when we're just starting to talk about a new problem. That level of collaboration feels beneficial, but actually coding the implementation together feels wasteful.

I know how I felt 3 years ago (when I joined DRW), and I know how I feel now. The fact that I've changed my tune so drastically made me think it was worth putting a few ideas on the web. I was also motivated a bit by the recent publication of Pair Programming Benefits. The Pair Programming Benefits article actually gives me a perfect starting place, by examining the benefits of pairing and how they apply to my context.

I feel like I can safely summarize the Team and System Benefits as: transfer of knowledge, superior quality, and collaboration. The Programmer Benefits can be summarized as: expansion of knowledge, truck factor reduction, superior quality, and team support. The Management/Project Management Benefits can be summarized as: aiding addition and removal of employees, knowledge transfer, and the ability to hire any skilled generalist.

So, our full list of benefits is: expansion and transfer of knowledge, superior quality, collaboration, truck factor reduction, team support, eased hiring and firing. I would agree with these benefits of pair-programming, which explains why I was such an avid believer in pairing in the past. However, if those pros don't apply to my situation then pair-programming becomes nothing more than using twice as many man hours to get the same amount of work done.

I'll start by addressing the easiest "benefits" that are effectively irrelevant for my team: eased hiring and firing and truck factor. Like I said before, my team size is 2. Mike and I are both very happy and there's almost no chance that either of us leave anytime soon. Even if we did choose to leave, we would give enough notice that someone could easily come in and replace us - and either one of us could support the applications on our own if necessary. The only real risk is that we both are killed at the same time - at which point it wouldn't matter if we paired or not. Statistically it's much more likely that only one of us would suffer something causing us to not be able to return to work, and as Chris Zuehlke (manager at DRW) once pointed out to me: if you build small, well written processes the time to understand or rewrite any software you know nothing about is small - often smaller than the cost of keeping 2 people informed on the details of the process. If I asked the stakeholders if we should use 2x as many man hours to ensure that we're covered in case one of us is suddenly killed they'd laugh at me - and I'd deserve it.

We're also not looking to add to our team, and if we decided we did want to grow I expect we wouldn't have any trouble finding someone who was looking for a well-paying job in manhattan using very cool technology.

The previous discussion was meant to address team size, but it also touches on "transfer of knowledge". Transfer of knowledge is valuable for teaching new techniques and enabling people to do maintenance. However, if little teaching is going on (often the case as Mike and I add features to an app we know fairly well, using technologies we know fairly well) and we both tend to do the maintenance on the code we previously wrote then "transfer of knowledge" is either not occurring anyway or is unnecessary. Also, on more than one occasion I've had to take a first look at some code Mike had recently written. It's always been straightforward and taken no more than an hour to dig-in, make, and test a change. Had we paired on everything he worked on I would have unquestionably spent far more than an hour here or there. Again, the ROI on knowledge transfer doesn't look good for our situation.

The "superior quality" benefit is an interesting one. I consider myself to be a good engineer who on occasion writes something beautiful. I like to share what's beautiful with everyone. This is something that's going to happen whether or not I pair (evidenced by my blogging). I know Mike does the same. I expect we're all proud of our most elegant solutions. However, more often than not I write fairly good software that isn't excitement provoking, but solves the problem at hand. I'm not sure anyone is benefiting from working with me at those moments. And, while it's possible that we'd get a slightly better solution if Mike & I paired, spending twice as many man hours for a 10% better solution doesn't seem like a great choice in the short term. In the long term those 10%s could add up, but what if I see the superior solution on my 2nd pass through and end up there while expanding a feature anyway? If I have a massive solution that might be a big cost. However, with small, focused processes I can tend to make a massive change to a process in ~4 hours. Delete is still my favorite refactoring and I do it as often as possible on my quests for superior solutions. Therefore, I don't believe those 10%s add up, I believe the 90% superior solution is often good enough or deleted.

I believe a few additional factors allow us to write code that is "good enough" on our own.
  • The application is fairly mature and many of the core design decisions have already been made.
  • Mike and I have Compatible Opinions on Software.
  • Mike and I are at a similar skill level.
In my current situation I find that many of the tough decisions have already been made and I can follow the existing path. And, any decisions that I do need to make I usually end up choosing what Mike would choose and solving in a way that we both think is appropriate.

While I believe "expansion of knowledge" is often undervalued and under-practiced in our profession, I don't think pair-programming is the most effective way to expand your skills. Blogs, books, and presentations allow you to work at your own pace and dig in to your desired level. My experience pairing with people using tools or languages I don't know have been painful. I did learn things, but I couldn't help but feel like there were much more effective ways for me to get up to speed.

The final 2 benefits that I need to consider are "collaboration" and "team support". As I previously stated, Mike and I collaborate quite closely. We're always both happy to stop what we're doing to provide an opinion or direction. I honestly can't imagine a more collaborative environment. Basically the same thing can be said of "team support" - we constantly watch each-other's back. We're both looking to succeed and we know we need each-other to get to that goal, so it only makes sense that we look out for each other. We don't need pairing to force us to support each-other, the desire to succeed is more than enough.

When considering my context and the "benefits" of (or, in my situation, lack of benefits of) pair-programming, I'm no longer surprised that I basically never want to pair these days. Also, looking back at the other teams I was previously judging - their context back then looks a lot like my current context. With a bit more experience under my belt I have to say I'm honestly a bit embarrassed for never even considering that they already "got it" and were already making the right choice. Oh well, live and learn.

While it's easy for me to examine my situation and make a determination I think there is a more general context where these same ideas apply. The aspects of an environment where pairing may not be beneficial seem to look something like this:
  • small team (< 4 people)
  • small software (components/processes can be rewritten in < 2 days)
  • motivated teammates
  • talented and well-seasoned teammates (similar skill level, and skilled overall)
  • the design of the application is fairly mature and stable
  • all team members have compatible options on software
Thanks to Michael Feathers, Alexey Verkhovsky, Pat Kua, Jeremy Lightsmith, Lance Walton, Dan Worthington-Bodart, James Richardson, and John Hume for providing feedback on the initial draft of this entry.

Tuesday, August 23, 2011

Clojure: Check For nil In a List

The every? function in Clojure is very helpful for determining if every element of a list passes a predicate. From the docs:
Usage: (every? pred coll)

Returns true if (pred x) is logical true for every x in coll, else false.
The usage of every? is very straightforward, but a quick REPL session is always nice to verify our assumptions.
Clojure 1.2.0

user=> (every? nil? [nil nil nil])
true
user=> (every? nil? [nil 1])
false
As expected, every? works well when you know exactly what predicate you need to use.

Yesterday, I was working on some code that included checking for nil - similar to the example below.
user=> (def front 1)

#'user/front
user=> (def back 2)
#'user/back
user=> (when (and front back) [front back])
[1 2]
This code works perfectly if you have the individual elements "front" and "back", but as the code evolved I ended up representing "front" and "back" simply as a list of elements. Changing to a list required a way to verify that each entry in the list was not nil.

I was 99% sure that "and" was a macro; therefore, combining it with apply wasn't an option. A quick REPL reference verified my suspicion.
user=> (def legs [front back])               

#'user/legs
user=> (when (apply and legs) legs)
java.lang.Exception: Can't take value of a macro: #'clojure.core/and (NO_SOURCE_FILE:8)
Several other options came to mind, an anonymous function that checked for (not (nil? %)), map the values to (not (nil? %)) and use every? with true?; however, because of Clojure's truthiness the identity function is really all you need. The following REPL session shows how identity works perfectly as our predicate for this example.
(when (every? identity legs) legs)

[1 2]
For a few more looks at behavior, here's a few examples that include nil and false.
user=> (every? identity [1 2 3 4])

true
user=> (every? identity [1 2 nil 4])
false
user=> (every? identity [1 false 4])
false
As you can see, using identity will cause every? to fail if any element is falsey (nil or false). In my case the elements are integers or nil, so this works perfectly; however, it's worth noting so you don't see unexpected results if booleans ever end up in your list.

Clojure: partition-by, split-with, group-by, and juxt

Today I ran into a common situation: I needed to split a list into 2 sublists - elements that passed a predicate and elements that failed a predicate. I'm sure I've run into this problem several times, but it's been awhile and I'd forgotten what options were available to me. A quick look at http://clojure.github.com/clojure/ reveals several potential functions: partition-by, split-with, and group-by.

partition-by
From the docs:
Usage: (partition-by f coll)

Applies f to each value in coll, splitting it each time f returns
a new value. Returns a lazy seq of partitions.
Let's assume we have a collection of ints and we want to split them into a list of evens and a list of odds. The following REPL session shows the result of calling partition-by with our list of ints.
user=> (partition-by even? [1 2 4 3 5 6])

((1) (2 4) (3 5) (6))
The partition-by function works as described; unfortunately, it's not exactly what I'm looking for. I need a function that returns ((1 3 5) (2 4 6)).

split-with
From the docs:
Usage: (split-with pred coll)

Returns a vector of [(take-while pred coll) (drop-while pred coll)]
The split-with function sounds promising, but a quick REPL session shows it's not what we're looking for.
user=> (split-with even? [1 2 4 3 5 6])

[() (1 2 4 3 5 6)]
As the docs state, the collection is split on the first item that fails the predicate - (even? 1).

group-by
From the docs:
Usage: (group-by f coll)

Returns a map of the elements of coll keyed by the result of f on each element. The value at each key will be a vector of the corresponding elements, in the order they appeared in coll.
The group-by function works, but it gives us a bit more than we're looking for.
user=> (group-by even? [1 2 4 3 5 6])

{false [1 3 5], true [2 4 6]}
The result as a map isn't exactly what we desire, but using a bit of destructuring allows us to grab the values we're looking for.
user=> (let [{evens true odds false} (group-by even? [1 2 4 3 5 6])]

[evens odds])
[[2 4 6] [1 3 5]]
The group-by results mixed with destructuring do the trick, but there's another option.

juxt
From the docs:
Usage: (juxt f)
              (juxt f g)
              (juxt f g h)
              (juxt f g h & fs)

Alpha - name subject to change.
Takes a set of functions and returns a fn that is the juxtaposition
of those fns. The returned fn takes a variable number of args, and
returns a vector containing the result of applying each fn to the
args (left-to-right).
((juxt a b c) x) => [(a x) (b x) (c x)]
The first time I ran into juxt I found it a bit intimidating. I couldn't tell you why, but if you feel the same way - don't feel bad. It turns out, juxt is exactly what we're looking for. The following REPL session shows how to combine juxt with filter and remove to produce the desired results.
user=> ((juxt filter remove) even? [1 2 4 3 5 6])

[(2 4 6) (1 3 5)]
There's one catch to using juxt in this way, the entire list is processed with filter and remove. In general this is acceptable; however, it's something worth considering when writing performance sensitive code.

Saturday, August 20, 2011

Clojure: Apply a Function To Each Value of a Map

I recently needed to update all of the values of map, and couldn't find an individual function in clojure.core that did the trick. Luckily, implementing an update-values function is actually very straightforward. The following function reduces the original map to a new map with the same elements, except the values are all the result of applying the specified function and any additional args.
(defn update-values [m f & args]

(reduce (fn [r [k v]] (assoc r k (apply f v args))) {} m))
The code is concise, but perhaps a bit terse. Still, it does the trick, as the REPL session below demonstrates.
Clojure 1.2.0

user=> (defn update-values [m f & args]
(reduce (fn [r [k v]] (assoc r k (apply f v args))) {} m))
#'user/update-values
user=> (update-values {:a 1 :b 2 :c 3} inc)
{:c 4, :b 3, :a 2}
user=> (update-values {:a 1 :b 2 :c 3} + 10)
{:c 13, :b 12, :a 11}
user=> (update-values {:a {:z 1} :b {:z 1} :c {:z 1}} dissoc :z)
{:c {}, :b {}, :a {}}
The last example is the specific problem I was looking to solve: remove a key-value pair from all the values of a map. However, the map I was working with actually had a bit of nesting. Let's define an example map that is similar to what I was actually working with.
user=> (def data {:boxes {"jay" {:linux 2 :win 1} "mike" {:linux 2 :win 2}}})

#'user/data
Easy enough, I have 2 linux boxes and 1 windows box, and Mike has 2 linux and 2 windows. But, now the company decides to discard all of our windows boxes, and we'll need to update each user. A quick combo of update-in with update-values does the trick.
user=> (update-in data [:boxes] update-values dissoc :win)

{:boxes {"mike" {:linux 2}, "jay" {:linux 2}}}
As you can see, the update-values function plays nice with existing Clojure functions as well.

Disclosure: I am long AAPL and own every device Apple puts out. Don't take my example numbers to literally. Also, much to my dismay, DRW has yet to discard all of our windows boxes.

Monday, August 01, 2011

Clojure: memfn

The other day I stumbled upon Clojure's memfn macro.
The memfn macro expands into code that creates a fn that expects to be passed an object and any args and calls the named instance method on the object passing the args. Use when you want to treat a Java method as a first-class fn.
(map (memfn charAt i) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)
-- clojure.org
At first glance it appeared to be something nice, but even the documentation states that "...it is almost always preferable to do this directly now..." - with an anonymous function.
(map #(.charAt %1 %2) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)
-- clojure.org, again
I pondered memfn. If it's almost always preferable to use an anonymous function, when is it preferable to use memfn? Nothing came to mind, so I moved on and never really gave memfn another thought.

Then the day came where I needed to test some Clojure code that called some very ugly and complex Java.

In production we have an object that is created in Java and passed directly to Clojure. Interacting with this object is easy (in production); however, creating an instance of that class (while testing) is an entirely different task. My interaction with the instance is minimal, only one method call, but it's an important method call. It needs to work perfectly today and every day forward.

I tried to construct the object myself. I wanted to test my interaction with this object from Clojure, but creating an instance turned out to be quite a significant task. After failing to easily create an instance after 15 minutes I decided to see if memfn could provide a solution. I'd never actually used memfn, but the documentation seemed promising.

In order to verify the behavior I was looking for, all I'll I needed was a function that I could rebind to return an expected value. The memfn macro provided exactly what I needed.

As a (contrived) example, let's assume you want to create a new order with a sequence id generated by incrementAndGet on AtomicLong. In production you'll use an actual AtomicLong and you might see something like the example below.
(def sequence-generator (AtomicLong.))
(defn new-order []
(hash-map :id (.incrementAndGet sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}
While that might be exactly what you need in production, it's generally preferable to use something more explicit while testing. I haven't found an easy way to rebind a Java method (.incrementAndGet in our example); however, if I use memfn I can create a first-class function that is easily rebound.
(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
(hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}
At this point we can see that memfn is calling our AtomicLong and our results haven't been altered in anyway. The final example shows a version that uses binding to ensure that inc&get always returns 10.
(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
(hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => 1
(println (new-order)) ; => 2
(binding [inc&get (fn [_] 10)]
(println (new-order)) ; => 10
(println (new-order))) ; => 10
With inc&get being constant, we can now easily test our new-order function.

Tuesday, July 19, 2011

The High-Level Test Whisperer

Most teams have High-Level Tests in what they call Functional Tests, Integration Tests, End-to-End Tests, Smoke Tests, User Tests, or something similar. These tests are designed to exercise as much of the application as possible.

I'm a fan of high-level tests; however, back in 2009 I decided on what I considered to be a sweet-spot for high-level testing: a dozen or less. The thing about high-level tests is that they are complicated and highly fragile. It's not uncommon for an unrelated change to break an entire suite of high-level tests. Truthfully, anything related to high-level testing always comes with an implicit "here be dragons".

I've been responsible for my fair share of authoring high-level tests. Despite my best efforts, I've never found a way to write high-level tests that aren't filled with subtle and complicated tweaks. That level of complication only equals heartbreak for teammates that are less familiar with the high-level tests. The issues often arise from concurrency, stubbing external resources, configuration properties, internal state exposure and manipulation, 3rd party components, and everything else that is required to test your application under production-like circumstances.

To make things worse, these tests are your last line of defense. Most issues that these tests would catch are caught by a lower-level test that is better designed to pin-point where the issue originates. The last straw - the vast majority of the times that the tests are broken it is due to the test infrastructure (not an actual flaw in the application) and it takes a significant amount of time to figure out how to fix the infrastructure.

I've thrown away my fair share of high-level tests. Entire suites. Unmaintainable and constantly broken tests due to false negatives simply don't carry their weight. On the other hand I've found plenty of success using high-level tests I've written. For awhile I thought my success with high-level tests came from a combination of my dedication to making them as easy as possible to work with and that I never allowed more than a dozen of them.

I joined a new team back in February. My new team has a bunch of high-level tests - about 50 of them. Consider me concerned. Not long after I started adding new features did I need to dig into the high-level test infrastructure. It's very complicated. Consider me skeptical. Over the next few months I kept reevaluating whether or not I thought they were worth the amount of effort I was putting into them. I polled a few teammates to get their happiness level. After 5 months of working with them I began my attack.

Each time my functional tests broke I spent no more than 5 minutes on my own looking for an obvious issue. If I couldn't find the problem I interrupted Mike, the guy who wrote the majority of the tests and the infrastructure. More often than not Mike was able to tweak the tests quickly and we both moved on. I anticipated that Mike would be able to fix all high-level test related issues relatively quickly; however, I expected he would grow tired of the effort and we would seek a smaller and more manageable high-level test suite.

A few more weeks passed with Mike happily fielding all my high-level test issues. This result started to feel familiar, I had played the exact same role on previous projects. I realized the reason that I had been successful with high-level tests that I had written was likely solely due to the fact that I had written them. The complexity of high-level test infrastructure almost ensures that an individual will be the expert and making changes to that infrastructure are as simple as moving a few variables in their world. On my new team, Mike was that expert. I began calling Mike the High-Level Test Whisperer.

At previous points in my career I might have been revolted by the idea of having an area of the code that required an expert. However, having played that role several times I'm pretty comfortable with the associated risks. Instead of fighting the current state of affairs I decided to embrace the situation. Not only do I grab Mike when any issues arise with the high-level tests, I also ask him to write me failing tests when new ones are appropriate for features I'm working on. It takes him a few minutes to whip up a few scenarios and check them in (commented out). Then I have failing tests that I can work with while implementing a few new features. We get the benefits of high-level tests without the pain.

Obviously this set-up only works if both parties are okay with the division of responsibility. Luckily, Mike and I are both happy with our roles given the existing high-level test suite. In addition, we've added 2 more processes (a 50% increase) since I joined. For both of those processes I've created the high-level tests and the associated infrastructure - and I also deal with any associated maintenance tasks.

This is a specific case of a more general pattern I've been observing recently: if the cost of keeping 2 people educated on a piece of technology is higher than the benefit, don't do it - bus risk be damned.

Individuals Over People

I've been pondering a few different ideas lately that all center around a common theme: to be maximally effective you need to identify and allow people to focus on their strengths.

I hear you: thanks Captain Obvious.

If you are reading this it's likely that you're familiar with the phrase "Individuals and interactions over processes and tools" from the Agile Manifesto. I'm sure we agree in principle, but I'm not sure we're talking about the same thing. In fact, it's more common to hear "people over process" when discussing Agile, which I believe is more appropriate for describing the value that Agile brings.

Agile emphasizes people (as a group) over processes and tools. However, there's little room for "individuals" on the Agile teams I've been a part of. I can provide several anecdotes -
  • When pair-programming with someone who prefers a Dvorak layout a compromise must be made.
  • When pair-programming with someone who prefers a different IDE a compromise must be made.
  • Collective code ownership implies anyone can work on anything, which often leads to inefficient story selection. (e.g. the business accidentally gives a card to someone who isn't ideally skilled for the task. Or, a developer decides to work on a card they aren't ideally skilled for despite other better suited and equally important outstanding cards)
  • Collective code ownership requires a lowest common denominator technology selection. (e.g. If 3 out of 5 people know Ruby and 5 out of 5 know Clojure, selecting Ruby for any application, even when it's the appropriate choice, is likely to be met by resistance)
  • Collective code ownership requires a lowest common denominator coding style selection. Let's be honest, it's easy to code in a language such as Ruby without a deep understanding of metaprogramming and evaluation. Both metaprogramming and evaluation are powerful; however, you can only take advantage of that power if you are sure everyone on the team is comfortable with both techniques.
I could ramble on a bit more, but hopefully you get my point.

I'm still a believer in Agile. It's the best way I know how to take an average performing team and put them on a path to becoming a well performing team. However, I think the Agile practices also put a ceiling on how effective a team can be. Perhaps my favorite anecdote: Ola Bini believes he is 10x faster when using Emacs as compared to IntelliJ - when writing Java! 10x is huge, so what does he use when he's pairing: IntelliJ. If there's knowledge transfer occurring then perhaps the 10x reduction in delivery speed is a good decision; however, if he's pairing with someone of a similar skill level who isn't statistically likely to maintain the code in the future it's a terrible choice. Ola is programming at 1/10 of his possible efficiency for no reason other than it's the Agile way.

Clearly, it's gray area - the knowledge transfer level will vary drastically based on who he's pairing with and what they are pairing on. That's the important point - people are more important than processes, but individuals are the most important. If you want to achieve maximum productivity you'll need to constantly reevaluate what the most effective path is based on the individuals that make up your team.

If you already agree with the ideas above then you're probably familiar with the idea that you need to learn all the rules to know when to break them. That's an old idea as well. Unfortunately, I don't see much written on this topic in the software development world. The last 3 years of my life have been lesson after lesson of how smart people can break the Agile rules and get much larger gains as a result. I've decided to tag posts that cover this subject as "Individuals over People". There are even a few historical entires available at http://blog.jayfields.com/search/label/individuals%20over%20people.

Hopefully these ideas will spark a few discussions and inspire other post-Agile developers to post their experiences as well.

Tuesday, July 12, 2011

Undervalued Start and Restart Related Questions

How long does it take to start or restart your application?

Start-up time tends to be a concern that's often overlooked by programmers who write unit tests. It will (likely) always be faster to run a few unit tests than start an application; however, having unit tests shouldn't take the place of actually firing up the application and spot checking with a bit of clicking around. Both efforts are good; however, I believe the combination of both efforts is a case where the sum is greater than the parts.

My current team made start-up time a priority. Currently we are able to launch our entire stack (currently 6 processes) and start using the software within 10 seconds. Ten seconds is fast, but I have been annoyed with it at times. I'll probably try to cut it down to 5 seconds at some point in the near future, depending on the level of effort needed to achieve a sub-5-second start-up.

That effort is really the largest blocker for most teams. The problem is, often it's not clear what's causing start up to take so long. Performance tuning start-up isn't exactly sexy work. However, if you start your app often, the investment can quickly pay dividends. For my team, we found the largest wins by caching remote data on our local boxes and deferring creating complex models while running on development machines. Those two simple tweaks turn a 1.5 minute start-up time into 10 seconds.

If your long start-up isn't bothering you because you don't do it very often, I'll have to re-emphasize that you are probably missing out on some valuable feedback.

Not time related, but start related: Does your application encounter data-loss if it's restarted?

In the past I've worked on teams where frequent daily roll-outs were common. There are two types of these teams I've encountered. Some teams do several same day roll-outs to get new features into production as fast as possible. Other teams end up doing multiple intraday rollouts to fix newly found bugs in production. Regardless of the driving force, I've found that those teams can stop and start their servers quickly and without any information loss.

My current team has software stable enough that we almost never roll out intraday due to a bug. We also have uptime demands that mean new features are almost never more valuable than not stopping the software intraday. I can only remember doing 2 intraday restarts across 30 processes since February.

There's nothing wrong with our situation; however, we don't optimize for intraday restarts. As part of not prioritizing intraday restart related tasks, we've never addressed a bit of data-loss that occurs on a restart. It's traditionally been believed that the data wasn't very important (nice-to-have, if you will). However, the other day I wanted to rollout a new feature in the morning - before our "day" began. One of our customers stopped me from rolling out the software because he didn't want to lose the (previously believed nice-to-have) overnight data.

That was the moment that drove home the fact that even in our circumstances we needed to be able to roll out new software as seamlessly as possible. Even if mid-day rollouts are rare, any problems that a mid-day rollout creates will make it less likely that you can do a mid-day rollout when that rare moment occurs.

Tests and daily rollouts are nice, but if your team is looking to move from good to great I would recommend a non-zero amount of actual application usage from the user's point of view and fixing any issues that are road-blocks to multiple intraday rollouts.

Wednesday, May 04, 2011

Clojure: Get All Nested Map Values (N levels deep)

I recently needed to pull all the values from a nested map. The map I was working with was designed to give easy access to data like so (formatted):
user=>
(def my-data {"Jay"
{"clojure"
{:name "Jay", :language "clojure", :enjoys true},
"ruby"
{:name "Jay", :language "ruby", :enjoys true}}
"Jon"
{"java"
{:name "Jon", :language "java", :enjoys true}}} )
#'user/my-data
user=> (get-in my-data ["Jon" "java"])
{:name "Jon", :language "java", :enjoys true}
user=> (get-in my-data ["Jay" "ruby"])
{:name "Jay", :language "ruby", :enjoys true}
This worked for all of the ways the data was accessed, until someone asked for a list of all people and all languages.

I needed a function that grabbed all the values nested to a point in the map (grabbing only the values instead of the deepest map wouldn't have provided valuable information). I created the following function that should grab all the values up to the nesting level specified.
(defn nth-vals* [a i m]
(if (and (map? m) (> i 0))
(reduce into a (map (fn [v] (nth-vals* a (dec i) v)) (vals m)))
(conj a m)))

(defn nth-vals [i m]
(if (nil? m)
{}
(nth-vals* [] i m)))
The nth-vals function can be used as the following example shows. (assuming the same map for my-data) (formatted)
user=> (nth-vals 2 my-data)
[{:name "Jay", :language "clojure", :enjoys true}
{:name "Jay", :language "ruby", :enjoys true}
{:name "Jon", :language "java", :enjoys true}]
For reference, here's what's returned if we reduce the map all the way to it's values.
user=> (nth-vals 3 my-data)
["Jay" "clojure" true "Jay" "ruby" true "Jon" "java" true]
That list of values may be helpful for someone else, but it wouldn't have solved our current problem.

And, if you're interested, here's what's returned if you ask for 1 level deep of values. (formatted, again)
user=> (nth-vals 1 my-data)
[{"clojure" {:name "Jay", :language "clojure", :enjoys true},
"ruby" {:name "Jay", :language "ruby", :enjoys true}}
{"java" {:name "Jon", :language "java", :enjoys true}}]
I wouldn't at all be surprised if something like this already exists in Clojure core, but I haven't found it yet.

Monday, May 02, 2011

Clojure: Converting a Java Object to a Clojure Map

Clojure has a great library that helps facilitate most things you'd want to do, but it's also easy to roll your own solution if you need something more specific. This blog entry is about converting an object to a map, something Clojure already has a function for: bean. From the docs:
Takes a Java object and returns a read-only implementation of the
map abstraction based upon its JavaBean properties.
The bean function is easy enough to use, as the following example shows (formatted for readability).
user=> (import 'java.util.Calendar)
java.util.Calendar
user=> (-> (Calendar/getInstance) .getTime bean)
{:seconds 16,
:date 2,
:class java.util.Date,
:minutes 35,
:hours 12,
:year 111,
:timezoneOffset 240,
:month 4,
:day 1,
:time 1304354116038}
The bean function is simple and solves 99% of the problems I encounter. Generally I'm destructuring a value that I pull from a map that was just created from an object; bean works perfectly for this.

However, I have run across two cases where I needed something more specialized.

The first case involved processing a large amount of data in a very timely fashion. We found that garbage collection pauses were slowing us down too much, and beaning the objects that were coming in was causing a large portion of the garbage. Since we were only dealing with one type of object we ended up writing something specific that only grabbed the fields we cared about.

The second case involved working with less friendly Java objects that needed to be converted to maps that were going to be used many times throughout the system. In general it isn't a big deal working with ugly Java objects. You can destructure the map into nice var names, or you can use the rename-keys function if you plan on using the map more than once. This probably would have worked for me, but I had an additional requirement. The particular Java objects I was working with had many methods/fields and the maps contained about 80% more information than I needed. So, I could have used bean, then rename keys, then dissoc. But, this felt similar enough to the situation in the past where I was working with specific method names and I thought I'd look for a more general solution.

The following example shows the definition of obj->map and the REPL output of using it.
user=> (import 'java.util.Date)
java.util.Date
user=> (defmacro obj->map [o & bindings]
(let [s (gensym "local")]
`(let [~s ~o]
~(->> (partition 2 bindings)
(map (fn [[k v]]
(if (vector? v)
[k (list (last v) (list (first v) s))]
[k (list v s)])))
(into {})))))
#'user/obj->map
user=> (obj->map (Date. 2012 1 31)
:month .getMonth
:year [.getYear #(* 2 %)]
:day .getDate)
{:month 2, :year 4024, :day 2}
As you can see from the output, the macro grabs the values of the methods you specify and stores them in a map. At one point I also needed to do a bit of massaging of a value, so I included the ability to use a vector to specify the method you care about and a function to apply to the value (the result of the function will become the value). In the example, you can see that the :year is double the value of what .getYear returns.

In general you should stick with bean, but if you need something more discriminating, obj->map might work for you.