Tuesday, July 27, 2010

Clojure: Destructuring

In The Joy of Clojure (TJoC) destructuring is described as a mini-language within Clojure. It's not essential to learn this mini-language; however, as the authors of TJoC point out, destructuring facilitates concise, elegant code.

What is destructuring?
Clojure supports abstract structural binding, often called destructuring, in let binding lists, fn parameter lists, and any macro that expands into a let or fn. -- http://clojure.org/special_forms
The simplest example of destructuring is assigning the values of a vector.
user=> (def point [5 7])
#'user/point

user=> (let [[x y] point]
(println "x:" x "y:" y))
x: 5 y: 7
note: I'm using let for my examples of destructuring; however, in practice I tend to use destructuring in function parameter lists at least as often, if not more often.

I'll admit that I can't remember ever using destructuring like the first example, but it's a good starting point. A more realistic example is splitting a vector into a head and a tail. When defining a function with an arglist** you use an ampersand. The same is true in destructuring.
user=> (def indexes [1 2 3])
#'user/indexes

user=> (let [[x & more] indexes]
(println "x:" x "more:" more))
x: 1 more: (2 3)
It's also worth noting that you can bind the entire vector to a local using the :as directive.
user=> (def indexes [1 2 3])
#'user/indexes

user=> (let [[x & more :as full-list] indexes]
(println "x:" x "more:" more "full list:" full-list))
x: 1 more: (2 3) full list: [1 2 3]
Vector examples are the easiest; however, in practice I find myself using destructuring with maps far more often.

Simple destructuring on a map is as easy as choosing a local name and providing the key.
user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{the-x :x the-y :y} point]
(println "x:" the-x "y:" the-y))
x: 5 y: 7
As the example shows, the values of :x and :y are bound to locals with the names the-x and the-y. In practice we would never prepend "the-" to our local names; however, using different names provides a bit of clarity for our first example. In production code you would be much more likely to want locals with the same name as the key. This works perfectly well, as the next example shows.
user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{x :x y :y} point]
(println "x:" x "y:" y))
x: 5 y: 7
While this works perfectly well, creating locals with the same name as the keys becomes tedious and annoying (especially when your keys are longer than one letter). Clojure anticipates this frustration and provides :keys directive that allows you to specify keys that you would like as locals with the same name.
user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{:keys [x y]} point]
(println "x:" x "y:" y))
x: 5 y: 7
There are a few directives that work while destructuring maps. The above example shows the use of :keys. In practice I end up using :keys the most; however, I've also used the :as directive while working with maps.

The following example illustrates the use of an :as directive to bind a local with the entire map.
user=> (def point {:x 5 :y 7})
#'user/point

user=> (let [{:keys [x y] :as the-point} point]
(println "x:" x "y:" y "point:" the-point))
x: 5 y: 7 point: {:x 5, :y 7}
We've now seen the :as directive used for both vectors and maps. In both cases the local is always assigned to the entire expression that is being destructured.

For completeness I'll document the :or directive; however, I must admit that I've never used it in practice. The :or directive is used to assign default values when the map being destructured doesn't contain a specified key.
user=> (def point {:y 7})
#'user/point

user=> (let [{:keys [x y] :or {x 0 y 0}} point]
(println "x:" x "y:" y))
x: 0 y: 7
Lastly, it's also worth noting that you can destructure nested maps, vectors and a combination of both.

The following example destructures a nested map
user=> (def book {:name "SICP" :details {:pages 657 :isbn-10 "0262011530"}})
#'user/book

user=> (let [{name :name {pages :pages isbn-10 :isbn-10} :details} book]
(println "name:" name "pages:" pages "isbn-10:" isbn-10))
name: SICP pages: 657 isbn-10: 0262011530
As you would expect, you can also use directives while destructuring nested maps.
user=> (def book {:name "SICP" :details {:pages 657 :isbn-10 "0262011530"}})
#'user/book
user=>
user=> (let [{name :name {:keys [pages isbn-10]} :details} book]
(println "name:" name "pages:" pages "isbn-10:" isbn-10))
name: SICP pages: 657 isbn-10: 0262011530
Destructuring nested vectors is also very straight-forward, as the following example illustrates
user=> (def numbers [[1 2][3 4]])
#'user/numbers

user=> (let [[[a b][c d]] numbers]
(println "a:" a "b:" b "c:" c "d:" d))
a: 1 b: 2 c: 3 d: 4
Since binding forms can be nested within one another arbitrarily, you can pull apart just about anything -- http://clojure.org/special_forms
The following example destructures a map and a vector at the same time.
user=> (def golfer {:name "Jim" :scores [3 5 4 5]})
#'user/golfer

user=> (let [{name :name [hole1 hole2] :scores} golfer]
(println "name:" name "hole1:" hole1 "hole2:" hole2))
name: Jim hole1: 3 hole2: 5
The same example can be rewritten using a function definition to show the simplicity of using destructuring in parameter lists.
user=> (defn print-status [{name :name [hole1 hole2] :scores}] 
(println "name:" name "hole1:" hole1 "hole2:" hole2))
#'user/print-status

user=> (print-status {:name "Jim" :scores [3 5 4 5]})
name: Jim hole1: 3 hole2: 5
There are other (less used) directives and deeper explanations available on http://clojure.org/special_forms and in The Joy of Clojure. I recommend both.

**(defn do-something [x y & more] ... )

Tuesday, July 20, 2010

Clojure: Composing Functions

Before Clojure, I had never used a Functional Programming language. My experience has primarily been in C#, Ruby, & Java. So, learning how to use Clojure effectively has been a fun and eye-opening experience.

I've noticed a few things:
  • The Clojure language has a lot of built in support for maps and hashes (literal syntax, destructuring, etc)
  • Rich Hickey is quoted as saying "It is better to have 100 functions operate on one data abstraction than 10 functions on 10 data structures." Clojure reflects this opinion and has loads of functions that work with Sequences.
  • I strongly dislike working with people who think Map<String, Map<String, String>> is generally a good idea in Java. However, all the support for maps and vectors in Clojure makes working with them easy and efficient.
  • Once you embrace functions, maps, vectors, etc, composing functions together to transform data becomes effortless.
Like I said, I'm still learning Clojure. Recently I refactored some code in a way that I thought was worth documenting for other people who are also learning Clojure.

Let's start with a simple example. Let's assume we want to track the score in the British Open (Golf). The data structure we will be working with will be a map of maps, keyed by country and then player.
{:England {"Paul Casey" -1}}
The above example shows that Paul Casey plays for England and currently has the score of -1. We need a function that allows you to update a players score. Let's also assume that scores for individual holes come in the following map form.
{:player "Paul Casey" :country :England :score -1}
Given the above code, you would expect the following test to pass.
(def current-score {:England {"Paul Casey" -1}})

(deftest update-score-test
(is (=
{:England {"Paul Casey" -2}}
(update-score current-score {:player "Paul Casey" :country :England :score -1}))))
(Let's ignore the fact that I'm def'ing current-score. It's only for simplicity's sake.)

There are a lot of ways to make that test pass. It took me a few tries to come up with something I liked. Below is the path I took.

Step one, get the test passing using the functions I know.
(defn update-score [current update]
(let [player (:player update)
country (:country update)
hole-score (:score update)]
(merge current {country {player (+ ((current country) player) hole-score)}})))
The above example code is a victory when you are learning Clojure. The tests pass, you've used a few features of the language. It's a great first pass. However, I'm sure the Clojure veterans are drooling over the refactoring possibilities.

Step two, learn the update-in function. Like I previously mentioned, Clojure has a lot of functions for working with sequences (maps, vectors, lists, etc). It's not always obvious which function you should be using. However, I've had loads of success learning from the mailing list, and getting immediate answers in the #clojure chat room on freenode. One way or another, I eventually found the update-in function, which is designed up update a nested key.
(defn update-score [current update]
(let [player (:player update)
country (:country update)
hole-score (:score update)]
(update-in current [country player] (fn [_] (+ ((current country) player) hole-score)))))
The switch to update-in is relatively easy and cleans up a bit of code, but now I've introduced an anonymous function. Anonymous functions are great, but if I can get away with not creating them, I always take that route.

However, before we eliminate the anonymous function, we're actually better off making sure we understand the new function (update-in). While the current version of the code works, it's much cleaner if you use the current value that is passed into the anonymous function instead of looking up the value yourself.
(defn update-score [current update]
(update-in current [(:country update) (:player update)] (fn [overall-score] (+ overall-score (:score update)))))
Since we no longer use country and player in multiple places, it seems like removing the let statement is a logical choice. As a result, the amount of code necessary to solve this problem seems to get smaller and smaller.

However, we can actually take this a step farther and use destructuring to provide an even cleaner implementation.
(defn update-score [current {:keys [country player score]}]
(update-in current [country player] (fn [overall-score] (+ overall-score score))))
If you aren't familiar with destructuring the syntax might be a bit confusing. I created a blog entry for destructuring that should help if you aren't familiar with it. There's also a good write up with destructuring examples available on clojure.org. I would also recommend writing a few simple functions and try out the examples for yourself.

There's one more step to take, but for me it was a big one. The documentation for update-in states:
Usage: (update-in m [k & ks] f & args)

'Updates' a value in a nested associative structure, where ks is a sequence of keys and f is a function that will take the old value and any supplied args and return the new value, and returns a new nested structure.
What I currently have works fine, but the additional args that update-in takes allows me to provide a more concise solution that I greatly prefer.
(defn update-score [current {:keys [country player score]}]
(update-in current [country player] + score))
This final version doesn't require any anonymous functions. Instead it relies on on update-in to apply the current value and any additional args to the function that you passed in. The final version is so concise you begin to wonder if you really need to define an update-score function.

The final snippet is an example of why I'm very intrigued by Clojure these days. The ability to compose functions and mix them in with the powerful functions of the language results in very concise code.

I consider what I would need to write to accomplish the same thing in Java or Ruby, and I have to wonder if the Functional Programming supporters might just be on to something.

Wednesday, July 07, 2010

High Level Testing with a High Level Language

In the early days of my project we made the decision to high-level test our Java application with Clojure*. One and a half years later, we're still following that path. It seemed worthwhile to document the progress so far.

My current preferred style of testing is rigorous unit testing and less than a dozen high level tests.

This style of testing doesn't catch everything; however, context is king. In my context, we constantly have to balance the number of bugs against the speed at which we deliver. We could test more, but it would slow down delivery. Since we don't want to drastically impact delivery, we try to get the most we can out of the tests that we do write.

A few more notes on context. Our high level tests are written by programmers for programmers. The application is fairly large and complex. We use Java, Clojure, Ruby, C#, & others. We take advantage of open-source frameworks as well as vendor and in-house frameworks. The result is a user-facing application used exclusively in-house. It's not a service or a 'for-sale' product.

The Bad
Context: The team knows Java fairly well and writes the majority of the domain code in Java. The team didn't have any previous experience with Clojure.
Result: Happiness with using a high level language was often impacted by no knowledge of that high level language.

Context: The vast majority of the code is written in Java and is able to be manipulated with IntelliJ.
Result: Some members of the team felt that using an additional language hampered their ability to use automated refactoring tools, and thus the rewards were not worth the cost. Other members believe the powerful language features provide benefits that out-weigh the costs.

Context: The tests need to run on developer boxes and build machines. One and a half years ago, there was no easy way to run Clojure tests in JUnit.
Result: The team hacked together something custom. It didn't take long to write, but the integration with IntelliJ and JUnit is not nearly as nice as using pure Java.

The Interesting
Context: The team has plenty of experience with Object Oriented (OO) based, C-style languages. The team didn't have any previous experience with Functional Programming (FP). Tests are procedural. However, the team was much more experienced writing procedural code in OO than FP.
Result: The paradigm shift impacted the speed at which the team was able to learn Clojure, but the team was able to peek into the world of FP without writing an entire application using an FP language.

The Good
Context: The team needed to build several utilities that allowed them to high level test the application.
Result: It was trivial to create a REPL that allowed us to communicate at a high level with our application. We practically got a command line interface to our application for free.

Context: The team was curious if Clojure would be the right language for solving other problems on the project. The team knew that with less than a dozen high level tests, rewriting the tests in another language would not be a large time investment.
Result: The team was able to determine that Clojure is a viable language choice for several tasks without having to take the leap on a mission critical component whose size may be variable.

Context: High level tests often require a significant amount of setup code. Clojure provides language features (duck-typing, macros, etc) that allow you to reduce the noise often generated by setup code.
Result: The tests are easier to read and maintain (assuming you understand what and how things are being hidden) and the amount of utility code required to run high level tests was significantly reduced.

*We still use Java to unit test our Java, for now.

Tuesday, July 06, 2010

Sacrificing some IDE capabilities

When discussing adopting Clojure or JRuby there are often heated discussions concerning the pros and cons of working with a language that has less IDE support. This blog entry will focus on the common concerns I hear.

I've never met anyone who has mastered Ruby or Clojure and had the opinion that they are more productive in Java. I think that's the best "proof" that the pros outweigh the cons.*

Other than that, things get complicated. The problem is, I understand where reluctant adopters are coming from. But, I just don't see any way to convince them to make the leap.

Many Java programmers do a bit of Ruby or Clojure and make a determination. I think their concerns are inflated by 3 factors: api/library fear, shallow understanding of powerful language features, and inability to optimize problem resolution.

A common concern is that learning a new language requires learning new libraries. Even though Java's libraries are available in Clojure or JRuby, both languages still have their own apis/libs to learn. In Java, you often use auto-completion. In Ruby and Clojure some auto-completion is available; however, it's often suspect. Few people use a Java REPL, so they don't understand how helpful/powerful a REPL is. In Clojure and Ruby, the REPL is often the most effective way to learn new apis/tools. Since most Java developers haven't experienced the value of the REPL, they exaggerate the overhead of learning apis/libs.

Writing idiomatic Java using Ruby or Clojure will produce painful and likely unmaintainable code. Writing idiomatic Ruby and Clojure often require understanding the powerful aspects of a language (metaprogramming, method_missing, macros, duck-typing). Understanding those aspects of a language can greatly change the design of a system. The result of mastering language features is often concise and significantly more maintainable code. However, if you aren't able to see the concise/maintainable version, you will picture a design that may likely be hard to manipulate without IDE support. Enter unnecessary concern and doubt.

Lastly, when things go wrong you need to know how to fix them. Fixing issues is often a mix of tweaking the language and mixing in a few tools. In Java it's often Ignore the noise in the stacktrace, Click a few links and open a few files, Drop in some break points or print lines or change a few tests. Getting to the source of the problem efficiently requires an IDE. In Clojure and Ruby I have those tools (since I do my Clojure and Ruby in IntelliJ), but I also have the REPLs. I know what noise to ignore. Even without those tools (when I used TextMate for Ruby), I know what patterns generally represent what issues. I can see the signal in the noise, and I have an additional tool (the REPL) to help me determine what is signal and what is noise.

However, if you don't know what is noise and what is signal, and (once again) you don't understand the value of a REPL then you exaggerate the cost of problem resolution.

Cosmin Stejerean recently mentioned "To me good IDE support for Clojure is very much like using training wheels on a bicycle."

That's almost true. I'll freely admit that I'd love some free refactoring. It would make me more productive. However, I'm already more productive so it's icing on the cake, not a reason to stop me from using the language.

Unfortunately, those training wheels are very much a deal-breaker to conservative adopters.

* I have recently heard of a few projects starting as Python and switching to Java, but I haven't ever done any Python, so I wont speculate as to why that happened.