Tuesday, October 02, 2012

Clojure: Avoiding Anonymous Functions

Clojure's standard library provides a lot of functionality, more functionality than I can easily remember by taking a quick glance at it. When I first started learning Clojure I used to read the api docs, hoping that when I needed something I'd easily be able to remember it. For some functions it worked, but not nearly enough.

Next, I went through several of the exercises on 4clojure.org and it opened my eyes to the sheer number of functions that I should have, but still didn't know. 4clojure.org helped me learn how to use many of the functions from the standard lib, but it also taught me a greater lesson: any data transformation I want to do can likely either be accomplished with a single function of clojure.core or by combining a few functions from clojure.core.

The following code has an example input and shows the desired output.


There are many ways to solve this problem, but when I began with Clojure I solved it with a reduce. In general, anytime I was transforming a seq to a map, I thought reduce was the right choice. The following example shows how to transform the data using a reduce


That works perfectly well and it's not a lot of code, but it's custom code. You can't know what the input is, look at the reduce, and know what the output is. You have to jump in the source to see what the transformation actually is.

You can solve this problem with an anonymous function, as the example below shows.


This solution isn't much code, but it's doing several things and requiring you to keep many things on your mental stack at the same time - what does the element look like, destructuring, the form of the result, the initial value, etc. It's not that tough to write, but it can be a bit tough to read when you come back to it 6 months later. Below is another solution, using only functions defined in clojure.core.


The above solution is more characters, but I consider it to be superior for two reasons:
  • Only clojure.core functions are used, so I am able to read the code without having to look elsewhere for implementation or documentation (and maintainers should be able to do the same).
  • The transformation happens in distinct and easy to understand steps.
I'm sure plenty of people reading this blog entry will disagree, and I'll agree that the anonymous function in this case isn't necessarily complicated enough that you'll want to spend the characters to avoid it. However, there's another reason to avoid the (fn): I believe you should seize every opportunity you get to become more familiar with the the standard library.

If the learning opportunity did not exist, I may feel differently; however, I currently feel much more comfortable with update-in than I do with using juxt, and to a lesser extent (partial apply hash-map) & (apply merge concat). If you found the solution I prefer harder to follow, then I suspect you may be in the same boat as me. If you were easily able to read and follow both solutions, it probably makes sense for you to simply do what you prefer. However, if you choose to define your own function I do believe you're leaving behind something that's harder to digest than a string of distinct steps that only use functions found in clojure.core.

Regardless of language, I believe that you should know the standard library inside and out. Time and time again (in Clojure) I've solved a problem with an anonymous function, only to later find that the standard library already defined exactly what I needed. A few examples from memory: find (select-keys with 1 key), keep (filter + remove nil?), map-indexed (map f coll (range)), mapcat (concat (map)). After making this mistake enough times, I devised a plan to avoid this situation in the future while also forcing myself to become more familiar with the standard library.

The plan is simple: when transforming data, don't use (fn) or #(), and only define a function when it cannot be done with -> or ->> and clojure.core.

My preferred solution (above) is a simple example of using threading and clojure.core to solve a problem without #() or (fn). This works for 90% of the transformation problems I encounter; however, there are times that I need to define a function. For example, I recently needed to take an initial value, pass it to reduce, then pass the result of the reduce as the initial value to another reduce. The initial value is the 2nd of reduce's 3 args, thus it cannot easily be threaded. In that situation, I find it appropriate to simply define my own function. Still, at least 90% of the time I can find a solution by combining existing clojure.core functions (often by using comp, juxt, or partial).

Here's another simple example: Given a list of maps, filter maps where :current-city is "new york"


Once you've made this step, you may start asking yourself: am I doing something unique, or am I doing something that's common enough to be somewhere in the standard library. More often than I expected, the answer is - yes, there's already a fn in the standard library. In this case, we can use clojure.set/join to join on the current city, thus removing our undesired data.


Asking the question, "this doesn't seem unique - shouldn't there be a fn in the standard library that does this?", is what led me to clojure.set/project, find and so many other functions. Now, when I look through old code, I find myself shaking my head and wishing I'd started down this path even earlier. Clojure makes it easy to define your own functions that quickly solve problems, but using what's already in clojure.core makes your code significantly easier for others to follow - learning the standard library inside and out is worth the effort in the long term.

5 comments:

  1. On your first example, you seem to be doing two things: grouping records by :employer, then extracting a particular field from each record. For the first need, in the interest of using existing functionality, group-by is the obvious answer. On the second, you want to apply your record function to each item in the vals of a map.

    I'd write something like this to make those two steps visible and create a reusable function for maps that have multiple records in the vals:

    https://gist.github.com/3827978

    ReplyDelete
  2. @Alex, We're in exactly the same boat. In jry (https://github.com/jaycfields/jry/blob/master/src/jry/core.clj#L79) I define an update-values fn, so I would solve this problem with

    (-> (group-by :employer coll) (update-values :name))

    but, I didn't want to reference jry in this blog post.

    Thanks for the comment. Cheers, Jay

    ReplyDelete
  3. I think learning Clojure is difficult for two reasons. For people coming from C/C++ and have not used Javascript extensively (closures, recursion, and so on), Clojure takes an adjustment.

    Second, some people learn from examples better than reading from a book. I am one of those people.

    I still find some Clojure code that I've written difficult to read a few months later, but then I do not get to write Clojure on a daily basis. If I did, it seems like looking at the code would get easier.

    ReplyDelete
  4. I'm about a year into Clojure and really glad I read this post.

    That said, I've lately found myself writing fns like this:

    (defn when-pos? [x] (when (pos? x) x))

    (defn find-first-pos-number [coll]
    (some when-pos? coll))

    where my "when-pos?" could be any sort of predicate that returns the value when the pred is truthy. This is necessary because `some` retrieves the first truthy result of the fn instead of the value that passes the truthy function.

    Is there a more appropriate fn than `some`?

    Another similar example:

    (def contains-foo? [s]
    (when (re-find #"foo" s) s))

    (def first-foo (some contains-foo? ["bar" "baz" "find this foo" "qux"]))

    ReplyDelete
  5. What do you think about the idea of defining your own small functions with meanful names as a kind of DSL?

    ReplyDelete

Note: Only a member of this blog may post a comment.