Monday, December 06, 2010

Clojure: get, get-in, contains?, and some

Clojure provides a get function that returns the value mapped to a key in a set or map. The documentation shows the example: (get map key). While that's completely valid, I tend to use sets and maps as functions when the get is that simple.

For example, I'd use ({"FSU" 31 "UF" 7} "FSU") if I wanted the value of the key "FSU". It's much less likely that I'd use (get {"FSU" 31 "UF" 7} "FSU"), largely because the former example is less typing.

However, if I'm doing something more complicated I've found the get function to be helpful. Often, I like to use the combination of get and -> or ->>.

The following example takes some json-data, converts it to a clojure map, and pulls the value from the "FSU" key.

(-> json-data read-json (get "FSU"))

It's also worth noting, in our example we have to use get, since strings are not Clojure functions. If instead we chose to make our keys keywords, we could choose either of the following solutions. I don't believe there is a right or wrong solution; which you use will likely be a personal preference.

(-> json-data (read-json true) (get :FSU))
(-> json-data (read-json true) :FSU)

We can modify the example and assume nested json that results in the following clojure map: {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}}

Building off of a previous example, we could use a slightly modified version to get the score for FSU.

(-> json-data read-json (get "scores") (get "FSU"))

However, getting nested values is common enough that Clojure provides a function designed specifically to address that need: get-in

The get-in function returns the value in a nested associative structure when given a sequence of keys. Using get-in you can replace the last example with the following code.

(-> json-data read-json (get-in ["scores" "FSU"]))

The get-in function is very helpful when dealing with nested structures; however, there is one gotcha that I've run into. The following shows a REPL session and what get-in returns with various keys.
user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores" "FSU"])
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores"])
{"FSU" 31, "UF" 7}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} [])
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} nil)
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}
Everything looks logical enough; however, if you are pulling your key sequence from somewhere else you could end up with unexpected results. The following example shows how a simple mistake could result in a bug.
user=> (def score-key-seqs {"FSU" ["scores" "FSU"]})                             
#'user/score-key-seqs

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "FSU"))
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "UF"))
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}
If you're always expecting a number and you get a map instead, things might not work out well.

It's also worth noting that both get and get-in allow you to specify default values. You can check the documentation on clojure.org for more information on default values.

You don't always need to get a value, sometimes it's good enough to know that a key is in a map or set. In general I use the value returned from a map or set to determine if a key exists - the following snippet uses that pattern.
(if (a-map :key) 
(do-true-behaviors)
(do-false-behaviors))
However, that pattern fails if the value of :key is nil. If it's possible that the value might be nil you might want to use Clojure's contains? function. The contains? function returns true if key is present in the given collection, otherwise returns false. The following code pasted from a REPL session demonstrates that contains? works perfectly well with nil.
user=> (contains? {:foo nil} :foo)
true
The contains? function works well with sets and maps; however, if you try to use it on a vector you might get surprising results.
user=> (contains? [1 3 4] 2)
true
For numerically indexed collections like vectors and Java arrays, the contains? function tests if the numeric key is within the range of indexes. The Clojure documentation recommends looking at the some function if you're looking for an item in a list.

The some function returns the first logical true value of a predicate for any item in the list, else nil. The following REPL session shows how you can use a set as the predicate with some to determine if a value is found in a list.
user=> (some #{2} [1 3 4])  
nil
user=> (some #{1} [1 3 4])
1
Clojure provides various functions for operating on maps and sets. At first glance some of them may look superfluous; however, as you spend more time working with sets and maps you'll start to appreciate the subtle differences and the value they provide.

Tuesday, November 30, 2010

Taking a Second Look at Collective Code Ownership

It's common to hear proponents of Agile discussing the benefits of collective code ownership. The benefits can be undeniable. Sharing knowledge ensures at least one other perspective and drastically reduces Bus Risk. However, sharing comes at a cost: time. The ROI of sharing with a few people can be greatly different than the ROI of sharing with 5 or more people.

I do believe in the benefits of collective code ownership. Collective code ownership is a step towards taking an underachieving team and turning them into a good team. However, I'm becoming more and more convinced that it's not the way to take good team and make them great.

(context: I believe these ideas apply to teams larger than 3. Teams of 2-3 should likely be working with everyone on everything.)

If you pair program, you will incur a context switch every time you rotate pairs. Context switches always have a non-zero cost, and the more you work on, the larger the context switch is likely to be. Teams where everyone works on everything are very likely paying for expensive context switches on a regular basis. In fact, the words Context Switch are often used specifically to point out the cost.

Working on everything also ensures a non-trivial amount of time ramping up on whatever code you are about to start working on. Sometimes you work on something you know fairly well. Other times you're working on something you've never seen before. Working on something you've never seen before creates two choices (assuming pair-programming): go along for the ride, understanding little - or - slow your pair down significantly while they explain what's going on.

Let's say you choose to slow your pair down for the full explanation: was it worth it? If you're the only other person that knows the component, it's very likely that it was worth it. What if everyone else on the team already knows that component deeply? Well, if they all die, you can maintain the app, but I don't think that's going to be the largest issue on the team.

(the same ideas apply if you don't pair-program, except you don't have the "go along for the ride" option)

I can hear some of you right now: We rotate enough that the context switch is virtually free and because we rotate so much there's little ramp up time. You might be right. Your problem domain might be so simple that jumping on and off of parts of your system is virtually free. However, if you're domain is complex in anyway, I think you're underestimating the cost of context switches and ramp up time. Also, the devil is traditionally in the details, so you're "simple domain" probably isn't as simple as you think.

Let's assume your domain is that simple: it might be cheaper to rewrite the software than take your Bus Number from 3 to 4.

Another benefit of pair programming combined with collective code ownership is bringing up everyone's skill level to that of the most skilled team member. In my opinion, that's something you need to worry about on an underachieving team, not a good team. If you're on a good team, it's likely that you can learn just as much from any member of your team; therefore, you are not losing anything by sticking to working with just a few of them in specific areas. You really only run into an education problem if you're team has more learners than mentors - and, in that case, you're not ready to worry about going from good to great.

There's also opportunity cost of not sticking to certain areas. Focusing on a problem allows you to create better solutions. Specifically, it allows you to create a vision of what needs to be done, work towards that vision and constantly revise where necessary.

Mark Twain once wrote that his letters would be shorter if he had more time. The same is often true of software. The simplest solution is not always the most obvious. If you're jumping from problem to problem, you're more likely to create an inferior solution. You'll solve problems, but you'll be creating higher maintenance costs for the project in the long term.

Instead, I often find it very helpful to ponder a problem and create a more simple and, very often, a more concise solution. In my experience, the maintenance costs are also greatly reduced by the simplified, condensed solution.

I'd like to repeat for clarity: Collective code ownership has benefits. There's no doubt that it is better to have everyone work on everything than have everyone focused on individual parts of the codebase. However, it's worth considering the cost of complete sharing if you are trying to move from good to great.

Saturday, October 30, 2010

Experience Report: Feature Toggle over Feature Branch

We often use Feature Toggle on my current team (when gradual release isn't possible). My experience so far has been: gradual release is better than Feature Toggle, and Feature Toggle is better than Feature Branch.

I found Martin's bliki entry on Feature Toggle to be a great description, but the entry doesn't touch on the primary reasons why I prefer Feature Toggle to Feature Branch.

When using Feature Branch I have to constantly rebase to avoid massive merge issues, in general. That means rebasing during development and testing. Rebasing a branch while it's complete and being tested, has often lead to subtle, merge related bugs that go unnoticed because the feature is not under active development.

Additionally, (and likely more problematic) once I merge a feature branch I am committed (no pun intended). If a bug in the new feature, that requires rolling production back to the previous release, is found after the branch has been merged I find myself rolling back the feature commit or continuing with a un-releasable trunk. Rolling back the commit is painful because it is likely a large commit. If I continue on with an un-releasable trunk and another (unrelated to the new feature) bug is found in trunk I'm in trouble: I can't fix the new bug and release.

That's bad. I either lose significant time rolling back the release (terrible), or I roll the dice (terrifying). I was burned by this situation once already this year. It is not something I'm looking to suffer again in the near future.

Feature Toggle avoids this entirely by allowing me to run with the toggle available and turn it back on if things go wrong. When I feel comfortable that everything is okay (usually, a week in prod is good enough), I clean up the unnecessary toggles.

Of course, nothing is black & white. Sometimes a Feature Toggle increases the scope of the work to an unacceptable level, and Feature Branch is the correct decision. However, I always weigh the terrible/terrifying situation when I'm choosing which direction is optimal.

It's also worth noting, I roll the current day's changes into production every night. It's possible that your experience will be vastly different if your release schedules are multi-day or multi-week.

Thursday, September 30, 2010

Clojure: Flatten Keys

I recently needed to take a nested map and flatten the keys. After a bit of trial and error I came up with the following code. (note: the example expects