Jay Fields' Thoughts: clojure functions

Showing posts with label clojure functions. Show all posts

Thursday, May 16, 2013

Clojure: Combining Calls To Doseq And Let

I've you've ever looked at the docs for clojure's for macro, then you probably know about the :let, :when, and :while modifiers. What you may not know is that those same modifiers are available in doseq.

I was recently working with some code that had the following form.

Upon seeing this code, John Hume asked if I preferred it to a single doseq with multiple bindings. He sent over an example that looked similar to the following example.

That was actually the first time that I'd seen multiple bindings in a doseq, and my immediate reaction was that I preferred the explicit simplicity of having multiple doseqs. However, I always have a preference for concise code, and I forced myself to starting using multiple bindings instead of multiple doseqs - and, unsurprisingly, I now prefer multiple bindings to multiple doseqs.

You might have noticed that the second version of the code slightly changes what's actually being done. In the original version the 'name' function is called once per 'id', and in the second version the 'name' function is called once per 'sub-id'. Calling name significantly more often isn't likely to have much impact on your program; however, if you were calling a more expensive function this change could have a negative impact. Luckily, (as I previously mentioned) doseq also provides support for :let.

The second example can be evolved to the following code - which also demonstrates that the let is only evaluated once per iteration.

That's really the final version of the original code, but you can alter it slightly for experimentation purposes if you'd like. Let's assume we have another function we're calling in an additional let and it's expensive, it would be nice if that only occurred when an iteration was going to happen. It turns out, that's exactly what happens.

Whether you prefer multiple bindings or multiple doseqs, it's probably a good idea to get comfortable reading both.

Tuesday, May 14, 2013

Clojure: Testing The Creation Of A Partial Function

I recently refactored some code that takes longs from two different sources to compute one value. The code originally stored the longs and called a function when all of the data arrived. The refactored version partials the data while it's incomplete and executes the partial'd function when all of the data is available. Below is a contrived example of what I'm taking about.

Let's pretend we need a function that will allow us to check whether or not another drink would make us legally drunk in New York City.

The code below stores the current bac and uses the value when legally-drunk? is called.

The following (passing) tests demonstrate that everything works as expected.

This code works without issue, but can also be refactored to store a partial'd function instead of the bac value. Why you would want to do such a thing is outside of the scope of this post, so we'll just assume this is a good refactoring. The code below no longer stores the bac value, and instead stores the pure-legally-drunk? function partial'd with the bac value.

Two of the three of the tests don't change; however, the test that was verifying the state is now broken.

note: The test output has been trimmed and reformatted to avoid horizontal scrolling.

In the output you can see that the test is failing as you'd expect, due to the change in what we're storing. What's broken is obvious, but there's not an obvious solution. Assuming you still want this state based test, how do you verify that you've partial'd the right function with the right value?

The solution is simple, but a bit tricky. As long as you don't find the redef too magical, the following solution allows you to easily verify the function that's being partial'd as well as the arguments.

Those tests all pass, and should provide security that the legally-drunk? and update-bac functions are sufficiently tested. The pure-legally-drunk? function still needs to be tested, but that should be easy since it's a pure function.

Would you want this kind of test? I think that becomes a matter of context and personal preference. Given the various paths through the code the following tests should provide complete coverage.

The above tests make no assumptions about the implementation - they actually pass whether you :use the 'original namespace or the 'refactored namespace. Conversely, the following tests verify each function in isolation and a few of them are very much tied to the implementation.

Both sets of tests would give me confidence that the code works as expected, so choosing which tests to use would become a matter of maintenance cost. I don't think there's anything special about these examples; I think they offer the traditional trade-offs between higher and lower level tests. A specific trade-off that stands out to me is identifying defect localization versus having to update the test when you update the code.

As I mentioned previously, the high-level-expectations work for both the 'original and the 'refactored namespaces. Being able to change the implementation without having to change the test is obviously an advantage of the high level tests. However, when things go wrong, the lower level tests provide better feedback for targeting the issue.

The following code is exactly the same as the code in refactored.clj, except it has a 1 character typo. (it's not necessary to spot the typo, the test output below will show you want it is)

The high level tests give us the following feedback.

failure in (high_level_expectations.clj:14) : expectations.high-level-expectations
(expect
 true
 (with-redefs
  [state (atom {})]
  (update-bac 0.01)
  (legally-drunk? 0.07)))

           expected: true 
                was: false

There's not much in that failure report to point us in the right direction. The unit-level-expectations provide significantly more information, and the details that should make it immediately obvious where the typo is.

failure in (unit_level_expectations.clj:8) : expectations.unit-level-expectations
(expect
 {:legally-drunk?* [pure-legally-drunk? 0.04]}
 (with-redefs [state (atom {}) partial vector] (update-bac 0.04)))

           expected: {:legally-drunk?* [# 0.04]} 
                was: {:legally-drunk?** [# 0.04]}
 
           :legally-drunk?** with val [# 0.04] 
                             is in actual, but not in expected
           :legally-drunk?* with val [# 0.04] 
                            is in expected, but not in actual

The above output points us directly to the extra asterisk in update-bac that caused the failure.

Still, I couldn't honestly tell you which of the above tests that I prefer. This specific example provides a situation where I think you could convincingly argue for either set of tests. However, as the code evolved I would likely choose one path or the other based on:

how much 'setup' is required for always using high-level tests?
how hard is it to guarantee integration using primarily unit-level tests?

In our examples the high level tests require redef'ing one bit of state. If that grew to a few pieces of state and/or a large increase in the complexity of the state, then I may be forced to move towards more unit-level tests. A rule of thumb I use: If a significant amount of the code within a test is setting up the test context, there's probably a smaller function and a set of associated tests waiting to be extracted.

By definition, the unit-level tests don't test the integration of the various functions. When I'm using unit-level tests, I'll often test the various code paths at the unit level and then have a happy-path high-level test that verifies integration of the various functions. My desire to have more high-level tests increases as the integration complexity increases, and at some point it makes sense to simply convert all of the tests to high-level tests.

If you constantly re-evaluate which tests will be more appropriate and switch when necessary, you'll definitely come out ahead in the long run.

Tuesday, October 02, 2012

Clojure: Avoiding Anonymous Functions

Clojure's standard library provides a lot of functionality, more functionality than I can easily remember by taking a quick glance at it. When I first started learning Clojure I used to read the api docs, hoping that when I needed something I'd easily be able to remember it. For some functions it worked, but not nearly enough.

Next, I went through several of the exercises on 4clojure.org and it opened my eyes to the sheer number of functions that I should have, but still didn't know. 4clojure.org helped me learn how to use many of the functions from the standard lib, but it also taught me a greater lesson: any data transformation I want to do can likely either be accomplished with a single function of clojure.core or by combining a few functions from clojure.core.

The following code has an example input and shows the desired output.

There are many ways to solve this problem, but when I began with Clojure I solved it with a reduce. In general, anytime I was transforming a seq to a map, I thought reduce was the right choice. The following example shows how to transform the data using a reduce

That works perfectly well and it's not a lot of code, but it's custom code. You can't know what the input is, look at the reduce, and know what the output is. You have to jump in the source to see what the transformation actually is.

You can solve this problem with an anonymous function, as the example below shows.

This solution isn't much code, but it's doing several things and requiring you to keep many things on your mental stack at the same time - what does the element look like, destructuring, the form of the result, the initial value, etc. It's not that tough to write, but it can be a bit tough to read when you come back to it 6 months later. Below is another solution, using only functions defined in clojure.core.

The above solution is more characters, but I consider it to be superior for two reasons:

Only clojure.core functions are used, so I am able to read the code without having to look elsewhere for implementation or documentation (and maintainers should be able to do the same).
The transformation happens in distinct and easy to understand steps.

I'm sure plenty of people reading this blog entry will disagree, and I'll agree that the anonymous function in this case isn't necessarily complicated enough that you'll want to spend the characters to avoid it. However, there's another reason to avoid the (fn): I believe you should seize every opportunity you get to become more familiar with the the standard library.

If the learning opportunity did not exist, I may feel differently; however, I currently feel much more comfortable with update-in than I do with using juxt, and to a lesser extent (partial apply hash-map) & (apply merge concat). If you found the solution I prefer harder to follow, then I suspect you may be in the same boat as me. If you were easily able to read and follow both solutions, it probably makes sense for you to simply do what you prefer. However, if you choose to define your own function I do believe you're leaving behind something that's harder to digest than a string of distinct steps that only use functions found in clojure.core.

Regardless of language, I believe that you should know the standard library inside and out. Time and time again (in Clojure) I've solved a problem with an anonymous function, only to later find that the standard library already defined exactly what I needed. A few examples from memory: find (select-keys with 1 key), keep (filter + remove nil?), map-indexed (map f coll (range)), mapcat (concat (map)). After making this mistake enough times, I devised a plan to avoid this situation in the future while also forcing myself to become more familiar with the standard library.

The plan is simple: when transforming data, don't use (fn) or #(), and only define a function when it cannot be done with -> or ->> and clojure.core.

My preferred solution (above) is a simple example of using threading and clojure.core to solve a problem without #() or (fn). This works for 90% of the transformation problems I encounter; however, there are times that I need to define a function. For example, I recently needed to take an initial value, pass it to reduce, then pass the result of the reduce as the initial value to another reduce. The initial value is the 2nd of reduce's 3 args, thus it cannot easily be threaded. In that situation, I find it appropriate to simply define my own function. Still, at least 90% of the time I can find a solution by combining existing clojure.core functions (often by using comp, juxt, or partial).

Here's another simple example: Given a list of maps, filter maps where :current-city is "new york"

Once you've made this step, you may start asking yourself: am I doing something unique, or am I doing something that's common enough to be somewhere in the standard library. More often than I expected, the answer is - yes, there's already a fn in the standard library. In this case, we can use clojure.set/join to join on the current city, thus removing our undesired data.

Asking the question, "this doesn't seem unique - shouldn't there be a fn in the standard library that does this?", is what led me to clojure.set/project, find and so many other functions. Now, when I look through old code, I find myself shaking my head and wishing I'd started down this path even earlier. Clojure makes it easy to define your own functions that quickly solve problems, but using what's already in clojure.core makes your code significantly easier for others to follow - learning the standard library inside and out is worth the effort in the long term.

Thursday, September 27, 2012

Clojure: Refactoring From Thread Last (->>) To Thread First (->)

I use ->> (thread-last) and -> (thread-first) very often. When I'm transforming data I find it easy to break things down mentally by taking small, specific steps, and I find that -> & ->> allow me to easily express my steps.

Let's begin with a (very contrived) example. Let's assume we have user data and we need a list of all users in "new york", grouped by their employer, and iff their employer is "drw.com" then we only want their name - otherwise we want all of the user's data. In terms of the input and the desired output, below is what we have and what we're looking for.

A solution that uses ->> can be found below.

The above example is very likely the first solution I would create. I go about solving the problem step by step, and if the first step takes my collection as the last argument then I will often begin by using ->>. However, after the solution is functional I will almost always refactor to -> if any of my "steps" do not take the result of the previous step as the last argument. I strongly dislike the above solution - using an anonymous function to make update-in usable with a thread-last feels wrong and is harder for me to parse (when compared with the alternatives found below).

The above solution could be refactored to the following solution

This solution is dry, but it also groups two of my three steps together, while leaving the other step at another level. I expect many people to prefer this solution, but it's not the one that I like the best.

The following solution is how I like to refactor from ->> to ->

My preferred solution has an "extra" thread-last, but it allows me to keep everything on the same level. By keeping everything on the same level, I'm able to easily look at the code and reason about what it's doing. I know that each step is an isolated transformation and I feel freed from keeping a mental stack of what's going on in the other steps.

Tuesday, September 25, 2012

Replacing Common Code With clojure.set Function Calls

If you've written a fair amount of Clojure code and aren't familiar with clojure.set, then chances are you've probably reinvented a few functions that are already available in the standard library. In this blog post I'll give a few examples of commonly written code, and I'll show the clojure.set functions that already do everything you need.

Removing elements from a collection is a very common programming task. Sometimes the collection will need to be a vector or a list, and removing an element from the collection will look similar to the example below.

user=> (remove #{1 2} [1 2 3 4 3 2 1])
(3 4 3)

In the cases where you're starting with a list and you want to return a seq, remove is a good solution. However, you may also find yourself starting with a set or looking to return a set.

If you're starting with sets, you'll probably get a performance gain by using clojure.set/difference, and if you're going to need a set returned it's less code and likely more performant to use clojure.set/difference rather than calling clojure.core/set on the results of clojure.core/remove.

clojure.set/difference is simple to use - from the docs

Usage: (difference s1)
       (difference s1 s2)
       (difference s1 s2 & sets)
Return a set that is the first set without elements of the remaining sets

A simple example of using clojure.set/difference can be found below.

user=> (clojure.set/difference #{1 2 3 4 5} #{1 2} #{3})
#{4 5}

Transforming data in clojure is something I do very often. On many occasions I've had a list of maps and I wanted them indexed by 1 or more values. This is fairly easy to do with reduce and update-in, as the example below demonstrates.

user=> (def jay {:name "jay fields" :employer "drw"})
#'user/jay
user=> (def mike {:name "mike jones" :employer "forward"})
#'user/mike
user=> (def john {:name "john dydo" :employer "drw"})
#'user/john
user=> (reduce #(update-in %1 [{:employer (:employer %2)}] conj %2) {} [jay mike john])
{{:employer "forward"} ({:name "mike jones", :employer "forward"}), 
 {:employer "drw"} ({:name "john dydo", :employer "drw"} 
                    {:name "jay fields", :employer "drw"})}

The reduce + update-in combo is a good one, but clojure.set/index is even better - since it's both more concise and doesn't require you to define an anonymous function. clojure.set/index is also very straightforward to use - from the docs

Usage: (index xrel ks)
Returns a map of the distinct values of ks in the xrel mapped to a set 
        of the maps in xrel with the corresponding values of ks.

The example below demonstrates how you can get very similar results to what is above by using clojure.set/index.

user=> (clojure.set/index [jay mike john] [:employer])
{{:employer "forward"} #{{:name "mike jones", :employer "forward"}}, 
 {:employer "drw"} #{{:name "john dydo", :employer "drw"} 
                     {:name "jay fields", :employer "drw"}}}

It is worth noting that the reduce + update-in example has seqs as values and can contain duplicates, and the clojure.set/index example has sets as values and will not contain duplicates. In practice, this has never been an issue for me.

Another common case while working with collections is finding the elements that are in both collections. Since sets are functions (and can be used a predicates), finding common elements is as simple as the following clojure.

user=> (filter (set [1 2 3]) [2 3 4])
(2 3)

Similar to the clojure.set/difference example, if you have lists or vectors in and you want a seq out, you may want to stick to using filter. However, if you are already working with sets or you can easily convert to sets, you'll probably want to take a look at clojure.set/intersection.

Usage: (intersection s1)
       (intersection s1 s2)
       (intersection s1 s2 & sets)
Return a set that is the intersection of the input sets

To get results similar to the above example, simply call clojure.set/intersection in a similar way to the example below.

user=> (clojure.set/intersection #{1 2 3} #{2 3 4})
#{2 3}

In a codebase I was once working on I stumbled upon the following code, which inverts a map.

user=> (reduce #(assoc %1 (val %2) (key %2)) {} {1 :one 2 :two 3 :three})
{:three 3, :two 2, :one 1}

The code is simple enough, but a single function call is always preferable.

Usage: (map-invert m)
Returns the map with the vals mapped to the keys.

The name of the function should be self-explanatory; however, an example is presented below for completeness.

user=> (clojure.set/map-invert {1 :one 2 :two 3 :three})
{:three 3, :two 2, :one 1}

Another common task I find myself doing while working with clojure is trimming data sets. The following code maps over a list of employees and filters out the employer information.

user=> (def jay {:fname "jay" :lname "fields" :employer "drw"})
#'user/jay
user=> (def mike {:fname "mike" :lname "jones" :employer "forward"})
#'user/mike
user=> (def john {:fname "john" :lname "dydo" :employer "drw"})
#'user/john
user=> (map #(select-keys %1 [:fname :lname]) [jay mike john])
({:lname "fields", :fname "jay"} 
 {:lname "jones", :fname "mike"} 
 {:lname "dydo", :fname "john"})

The combination of map + select-keys gets the job done, but clojure.set gives us with one function, clojure.set/project, that provides us with virtually the same result - using less code.

Usage: (project xrel ks)
Returns a rel of the elements of xrel with only the keys in ks

The example below demonstrates the similarity in functionality.

user=> (clojure.set/project [jay mike john] [:fname :lname])
#{{:lname "fields", :fname "jay"} 
  {:lname "dydo", :fname "john"} 
  {:lname "jones", :fname "mike"}}

Similar to clojure.set/index, you'll want to take note of the result being a set and not a list, and just like clojure.set/index, this isn't something that ends up causing a problem in practice.

The rename and rename-keys functions of clojure.set are very similar, and they can both be helpful when you're passing around data-structures that are similar and simply require a few renames to play nicely with existing code.

Below are a few simple examples of how to get things done without rename and rename-keys.

user=> (def jay {:fname "jay" :lname "fields" :employer "drw"})
#'user/jay
user=> (def mike {:fname "mike" :lname "jones" :employer "forward"})
#'user/mike
user=> (def john {:fname "john" :lname "dydo" :employer "drw"})
#'user/john
user=> (map 
         (fn [{:keys [fname lname] :as m}] 
             (-> m 
                 (assoc :first-name fname :last-name lname) 
                 (dissoc :fname :lname))) 
         [jay mike john])
({:last-name "fields", :first-name "jay", :employer "drw"} 
 {:last-name "jones", :first-name "mike", :employer "forward"} 
 {:last-name "dydo", :first-name "john", :employer "drw"})

user=> (reduce #(assoc %1 ({1 "one" 2 "two"} (key %2)) (val %2)) {} {1 :one 2 :two})
{"two" :two, "one" :one}

The rename & rename-keys functions are very straightforward, and you can find their documentation and example usages below.

Usage: (rename xrel kmap)
Returns a rel of the maps in xrel with the keys in kmap renamed to the vals in kmap

Usage: (rename-keys map kmap)
Returns the map with the keys in kmap renamed to the vals in kmap

user=> (clojure.set/rename [jay mike john] {:fname :first-name :lname :last-name})
#{{:last-name "jones", :first-name "mike", :employer "forward"} 
  {:last-name "dydo", :first-name "john", :employer "drw"} 
  {:last-name "fields", :first-name "jay", :employer "drw"}}

user=> (clojure.set/rename-keys {1 :one 2 :two} {1 "one" 2 "two"})
{"two" :two, "one" :one}

If you've gotten this far, I'll assume you already understand how to use filter. The clojure.set namespace has a function that's very similar to filter, but it returns a set. If you don't need a set, you're better off sticking with filter; however, if you're working with sets, you might save yourself a few keystrokes and microseconds by using clojure.set/select instead.

Below is a the documentation and an example.

Usage: (select pred xset)
Returns a set of the elements for which pred is true

user=> (clojure.set/select odd? #{1 2 3 4})
#{1 3}

The clojure.set/subset? and clojure.set/superset? functions are also functions that are straightforward to use, and probably don't benefit from an example of how to create the same results on your own. However, I will provide the docs and 2 brief examples of their usage.

Usage: (subset? set1 set2)
Is set1 a subset of set2?

Usage: (superset? set1 set2)
Is set1 a superset of set2?

user=> (clojure.set/superset? #{1 2 3} #{2 3})
true
user=> (clojure.set/subset? #{1 2} #{1 2 3})
true

The final function I will document is clojure.set/union. If you needed a list of the unique elements resulting from combining 2 or more lists, you could get the job done with a combination of concat, reduce, and/or set. The example below shows how to do things without using the set function or a set data-structure. note: Using a set would likely be both more efficient and more readable. This example is designed to show that you could do things without sets, but I do not recommend that you code in this way.

(reduce 
  #(if (some (partial = %2) %1) %1 (conj %1 %2)) 
  [] 
  (concat [1 2 1] [2 4 3 1])) 
[1 2 4 3]

Truthfully, I don't tend to think about 'union' unless I'm already thinking about sets. In Clojure, clojure.set/union is defined to take multiple sets and return the union of each of those sets (as you'd expect).

Usage: (union)
       (union s1)
       (union s1 s2)
       (union s1 s2 & sets)
Return a set that is the union of the input sets

Finally, the example below shows the union function in action.

user=> (clojure.set/union #{1 2} #{2 4 3 1})
#{1 2 3 4}

The clojure.set namespace does define one additional function, clojure.set/join. To be honest, I haven't used join in production and I don't believe that I'm writing my own inferior versions within my codebases. So, I don't have an example for you, but I do like the examples on clojuredocs.org and I would encourage you to go check them out: http://clojuredocs.org/clojure_core/1.2.0/clojure.set/join

Tuesday, June 26, 2012

Reading Clojure Stacktraces

Clojure stacktraces are not incredibly user friendly. Once I got used to the status quo, I forgot how much noise lives within a stacktrace; however, every so often a Clojure beginner will remind the community that stacktraces are a bit convoluted. You can blame the JWM, lack of prioritization from the Clojure community, or someone else if you wish, but the reality is - I don't expect stacktraces to change anytime soon. This blog entry is about separating the signal from the noise within a stacktrace.

note: all code for this blog entry can be found at: http://github.com/blog-jayfields-com/Reading-Clojure-Stacktraces

Let's start with a very simple example.

Running (I'm using 'lein run') the above code you should get a stacktrace that looks like the output below.

lmp-jfields03:reading-clojure-stacktraces jfields$ lein run
Exception in thread "main" java.lang.RuntimeException: thrown
 at reading_clojure_stacktraces.core$foo.invoke(core.clj:3)
 at reading_clojure_stacktraces.core$_main.invoke(core.clj:6)
 at clojure.lang.Var.invoke(Var.java:397)
 at user$eval37.invoke(NO_SOURCE_FILE:1)
 at clojure.lang.Compiler.eval(Compiler.java:6465)
 at clojure.lang.Compiler.eval(Compiler.java:6455)
[blah blah blah]

I snipped a fair bit of stacktrace and replaced it with [blah blah blah]. I did that because that's what I mentally do as well, I look for the last line that includes a file that I've created and I ignore everything after a few lines below my line. That is my first recommendation - If you see a stacktrace, it's likely that the problem is in your code, not Clojure. Look for the last line of your code (N) and ignore every line below N + 3.

In this example, user$eval... likely has something to do with lein, and I can safely assume that the problem is likely not in there. Moving up from there I can see a line from my code:

reading_clojure_stacktraces.core$_main.invoke(core.clj:6)

When I read the above line I see the problem is in namespace 'reading-clojure-stacktraces/core', in the function '-main', in the file core.clj, on line 6. I'm no Clojure internals expert, but I believe Clojure actually creates a class named reading_clojure_stacktraces.core$_main with an 'invoke' method; however, I truthfully don't know (and you wont need to either). Whether a class is created or not, it makes sense that the line will need to be formatted to fit a valid Java class name - which explains why our dashes have been converted to underscores.

Moving up another line, I can see that the issue is likely inside the 'foo' function of the reading-clojure-stacktraces namespace. A quick review of the original code shows that line 3 of core.clj contains the call to throw, and everything makes perfect sense.

If all Clojure stacktraces were this simple, I probably wouldn't bother with this blog entry; however, things can become a bit more complicated as you introduce anonymous functions.

The following snippet of code removes the 'foo' function and throws an exception from within an anonymous function.

Another trip to 'lein run' produces the following output.

Exception in thread "main" java.lang.RuntimeException: thrown
 at reading_clojure_stacktraces.core$_main$fn__9.invoke(core.clj:4)
 at reading_clojure_stacktraces.core$_main.invoke(core.clj:4)
 at clojure.lang.Var.invoke(Var.java:397)
 at user$eval38.invoke(NO_SOURCE_FILE:1)
 at clojure.lang.Compiler.eval(Compiler.java:6465)

The above stacktrace does give you the correct file and line number of where the issue originates; however, you'll notice that the function that threw the exception has become a bit less easy to identify. My use of an anonymous function led to Clojure naming the function fn__9, and there's nothing wrong with that. In fact, this example is especially readable as the stacktrace shows that fn__9 was created inside the -main function.

I expect you'd be able to find the issue with our contrived example without any further help; however, production code (often making use of high order functions) can lead to significantly more complex stacktraces. You could forsake anonymous functions, but there's a nice middle ground that is also helpful for debugging - temporarily name your anonymous functions.

Clojure's reader transforms the Anonymous function literal in the following way.

#(...) => (fn [args] (...))

Therefore, the following code will be the same as the example above, from Clojure's perspective.

Another quick 'lein run' verifies that the stacktrace is the same (and I see no reason to repeat it here). However, now that we've switched to fn, we can provide a (rarely used, optional) name.

At this point, 'lein run' should produce the following output.

Exception in thread "main" java.lang.RuntimeException: thrown
 at reading_clojure_stacktraces.core$_main$i_throw__9.invoke(core.clj:4)
 at reading_clojure_stacktraces.core$_main.invoke(core.clj:4)
 at clojure.lang.Var.invoke(Var.java:397)
 at user$eval38.invoke(NO_SOURCE_FILE:1)
 at clojure.lang.Compiler.eval(Compiler.java:6465)

Now our line contains a bit more information. The two $ signs still indicate that the function with an issue is a function created inside -main; however, our stacktrace also includes the name (in bold) we specified for our function. You can use any valid symbol characters, so feel free to put anything you want in the name while you're debugging.

note: Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? -- clojure.org.

So far, all of the examples have been somewhat noisy, but mildly easy to mentally filter. Unfortunately, idiomatic Clojure code can also lead to stacktraces that bounce back and forth between your code and the standard library, leaving you to sift through significantly longer stacktraces.

The following snippet of code throws a NullPointerException due to a mistake I clearly made, but the last line of 'my code' is in the lower half of a long stacktrace.

The above example code produces the below stacktrace.

Exception in thread "main" java.lang.NullPointerException
 at clojure.lang.Numbers.ops(Numbers.java:942)
 at clojure.lang.Numbers.inc(Numbers.java:110)
 at clojure.core$inc.invoke(core.clj:862)
 at clojure.core$map$fn__3811.invoke(core.clj:2430)
 at clojure.lang.LazySeq.sval(LazySeq.java:42)
 at clojure.lang.LazySeq.seq(LazySeq.java:60)
 at clojure.lang.RT.seq(RT.java:466)
 at clojure.core$seq.invoke(core.clj:133)
 at clojure.core$print_sequential.invoke(core_print.clj:46)
 at clojure.core$fn__4990.invoke(core_print.clj:140)
 at clojure.lang.MultiFn.invoke(MultiFn.java:167)
 at clojure.core$pr_on.invoke(core.clj:3264)
 at clojure.core$pr.invoke(core.clj:3276)
 at clojure.lang.AFn.applyToHelper(AFn.java:161)
 at clojure.lang.RestFn.applyTo(RestFn.java:132)
 at clojure.core$apply.invoke(core.clj:600)
 at clojure.core$prn.doInvoke(core.clj:3309)
 at clojure.lang.RestFn.applyTo(RestFn.java:137)
 at clojure.core$apply.invoke(core.clj:600)
 at clojure.core$println.doInvoke(core.clj:3329)
 at clojure.lang.RestFn.invoke(RestFn.java:408)
 at reading_clojure_stacktraces.core$_main.invoke(core.clj:7)
 at clojure.lang.Var.invoke(Var.java:397)
 at user$eval37.invoke(NO_SOURCE_FILE:1)
 at clojure.lang.Compiler.eval(Compiler.java:6465)
 at clojure.lang.Compiler.eval(Compiler.java:6455)

In situations like these, I generally look at the top few lines to get a bit of context, and then I scroll down to find the last line of 'my code'. Looking at the top 4 lines from the stacktrace I can see that the issue is with the inc function, which was passed to the high order function named map. If I look father down the stacktrace, I can see that line 7 in reading-clojure-stacktraces.core is where the issue began in 'my code'.

If you look at line 7 of (reading-clojure-stacktraces) core.clj, you'll notice that I'm merely printing the results of calling foo - yet the issue seems to be with the map function that is invoked within foo. This is because map is lazy, and the evaluation is deferred until we attempt to print the results of mapping inc. While it's not exactly obvious, the stacktrace does contain all the hints we need to find the issue. Line 3 lets us know that inc is getting a nil. Line 4 lets us know that it's happening inside a map. Line 5 lets us know that we're dealing with laziness. And, the line containing our namespace lets us know where to begin looking.

The following example is very similar; however, it uses a partial to achieve the same result.

The above example code produces the below stacktrace.

Exception in thread "main" java.lang.NullPointerException
 at clojure.lang.Numbers.ops(Numbers.java:942)
 at clojure.lang.Numbers.add(Numbers.java:126)
 at clojure.core$_PLUS_.invoke(core.clj:927)
 at clojure.lang.AFn.applyToHelper(AFn.java:163)
 at clojure.lang.RestFn.applyTo(RestFn.java:132)
 at clojure.core$apply.invoke(core.clj:602)
 at clojure.core$partial$fn__3794.doInvoke(core.clj:2341)
 at clojure.lang.RestFn.invoke(RestFn.java:408)
 at clojure.core$map$fn__3811.invoke(core.clj:2430)
 at clojure.lang.LazySeq.sval(LazySeq.java:42)
 at clojure.lang.LazySeq.seq(LazySeq.java:60)
 at clojure.lang.RT.seq(RT.java:466)
 at clojure.core$seq.invoke(core.clj:133)
 at clojure.core$print_sequential.invoke(core_print.clj:46)
 at clojure.core$fn__4990.invoke(core_print.clj:140)
 at clojure.lang.MultiFn.invoke(MultiFn.java:167)
 at clojure.core$pr_on.invoke(core.clj:3264)
 at clojure.core$pr.invoke(core.clj:3276)
 at clojure.lang.AFn.applyToHelper(AFn.java:161)
 at clojure.lang.RestFn.applyTo(RestFn.java:132)
 at clojure.core$apply.invoke(core.clj:600)
 at clojure.core$prn.doInvoke(core.clj:3309)
 at clojure.lang.RestFn.applyTo(RestFn.java:137)
 at clojure.core$apply.invoke(core.clj:600)
 at clojure.core$println.doInvoke(core.clj:3329)
 at clojure.lang.RestFn.invoke(RestFn.java:408)
 at reading_clojure_stacktraces.core$_main.invoke(core.clj:7)
 at clojure.lang.Var.invoke(Var.java:397)
 at user$eval37.invoke(NO_SOURCE_FILE:1)
 at clojure.lang.Compiler.eval(Compiler.java:6465)

Again, you'll want to skim the stacktrace for hints. In the above stacktrace we can see that line 3 is telling us that + is the issue. Line 7 lets us know that partial was used. And, the remaining hints are the same as the previous example.

Skimming for hints may look painful at first. However, you quickly learn to filter out the common Clojure related noise. For example, anything that starts with 'clojure' and looks like a standard Java class name is highly unlikely to be where a problem exists. For example, clojure.lang.Numbers.ops isn't likely to have a bug. Likewise, you'll often see the same classes and methods repeated across all possible errors - clojure.lang.AFn, clojure.lang.RestFn, clojure.core$apply, clojure.lang.LazySeq, clojure.lang.RT, clojure.lang.MultiFn, etc, etc. These functions are often used building blocks for almost everything Clojure does. Those lines provide a bit of signal, but (for the most part) can safely be ignored.

Again, it can be a bit annoying to deal with Clojure stacktraces when getting started; however, if you take the time to understand which lines are signal and which lines are noise, then they become helpful debugging tools.

related: If you want a testing library that helps you filter some of the stacktrace noise, you might want to check out expectations.

Monday, June 11, 2012

Clojure: name function

The 'name' function is a clojure function that returns the string for a keyword, symbol, or string.

name - function
Usage: (name x)
Returns the name String of a string, symbol or keyword.

At first glace this might not seem that interesting; however, it's good to know 'name' if you've ever been surprised by (str :foo) => ":foo". If you have a ruby background (as I do), you probably expected the result to be "foo", spent a bit of time looking, and found that (name :foo) was actually what you were looking for.

That's helpful, but not particularly exciting. Perhaps a more interesting application of name is the ability to normalize all keys as strings and destructure. For example, say you're designing a library that monitors threads and you want to be able to pass in warning and error thresholds. Usage of your functions may look like the following examples

(monitored-threads/create :warn-threshold 100 :error-threshold 200)
(monitored-threads/create "warn-threshold" 100 "error-threshold" 200)

Assuming a simple function that updates keys:

(defn update-keys [m f]
  (reduce (fn [r [k v]] (assoc r (f k) v)) {} m))

You can now write your create function as:

(defn create [& {:as m}]
  (let [{:strs [warn-threshold error-threshold]} (update-keys m name)]
    ; do work, son
    ))

Wednesday, December 28, 2011

Clojure & Java Interop

About a year ago I got a phone call asking if I wanted to join another team at DRW. The team supports a (primarily) Java application, but the performance requirements would also allow it to be written in a higher level language. I'd been writing Clojure (basically) full-time at that point - so my response was simple: I'd love to join, but I'm going to want to do future development using Clojure.

A year later we still have plenty of Java, but the vast majority of the new code I add is Clojure. One of the big reasons I'm able to use Clojure so freely is the seamless interop with Java.

Execute Clojure from Java
Calling Clojure from Java is as simple as loading the .clj file and invoking a method from that file. I used the same example years ago, but I'll inline it here for simplicity.

; interop/core.clj
(ns interop.core)

(defn print-string [arg]
  (println arg))

// Java calling code
RT.loadResourceScript("interop/core.clj");
RT.var("interop.core", "print-string").invoke("hello world");

note: examples from this blog entry are available in this git repo. The commit with the code from the previous example is available here and I'm running the example from the command line with:

lein jar && java -cp "interop-1.0.0.jar:lib/*" interop.Example

Execute Java from Clojure
At this point we have Java executing some Clojure code, and we also have Clojure using an object that was created in Java. Even though we're in Clojure we can easily call methods on any Java object.

(ns interop.core)

(defn print-string [arg]
  (println arg "is" (.length arg) "characters long"))

commit

The above code (using the length method of a String instance) produces the following output.

hello world is 11 characters long

Calling a Java method and passing in additional arguments is also easy in Clojure.

(ns interop.core)

(defn print-string [arg]
  (println (.replace arg "hello" "goodbye")))

commit

The above code produces the following output.

goodbye world

There are a few other things to know about calling Java from Clojure. The following examples show how to call static methods, use enums, and use inner classes.

(ns interop.core)

(defn print-string [arg]
  ;;; calling a static method
  (println (String/valueOf true))

  ;;; using an enum
  (println (java.util.concurrent.TimeUnit/SECONDS))

  ;;; using a Java nested (inner) class. Note, in Clojure you
  ;;; use a $ instead of a .
  (println (java.util.AbstractMap$SimpleEntry. "key" "val")))

commit

And, the output:

true
#< SECONDS>
#<SimpleEntry key=val>

Create Java objects in Clojure
When working with Clojure you'll likely want to interact with existing Java objects, but you'll probably also want to create new instances of Java objects. You might have noticed the dot at the end of Abstract$SimpleEntry. in the previous example - that's how you instruct Clojure to create an instance of a Java object. The following example shows the dot notation for calling a constructor of the String class.

(ns interop.core)

(defn print-string [arg]
  (println (String. arg)))

commit

At this point our output is back to the original output.

hello world

When creating Java objects it's often beneficial to know which Java interfaces the Clojure data structures implement. The following examples demonstrate how you can create Java objects while passing Clojure datastructures (and functions) as constructor arguments.

(ns interop.core)

(defn print-string [arg]
  ;;; pass a Clojure vector where Java expects a java.util.Collection
  (println (java.util.HashSet. ["1" "2"]))

  ;;; pass a Clojure map where Java expects a java.util.Map
  (println (java.util.LinkedHashMap. {1 "1" 2 "2"}))

  ;;; pass a Clojure function where Java expects a Runnable
  (println (Thread. (fn [] (println "clojure fns are runnables (and callables)")))))

commit

The output shows the constructed Java objects.

#<HashSet [2, 1]>
#<LinkedHashMap {1=1, 2=2}>
#<Thread Thread[Thread-1,5,main]>

Calling constructors in Clojure is very easy, but that's not always an option when creating a Java object. At times you will likely need to create an instance of a Java interface. Clojure provides both proxy and reify for creating instances of Java interfaces. The following example demonstrates the syntax for using either proxy or reify.

(ns interop.core)

(defn proxy-coll []
  (proxy [java.util.Collection] []
    (add [o]
         (println o)
         true)))

(defn reify-coll []
  (reify java.util.Collection
    (add [this o]
         (println o)
         (println this)
         true)))

(defn main []
  (.add (proxy-coll) "this string is printed on proxied.add")
  (.add (reify-coll) "this string is printed on reified.add"))

commit

note, I also changed Example.java (the details are available in the above linked commit). The syntax for proxy and reify are fairly similar, and both offer additional options that are worth looking into. The primary differences between these two simple examples are:

The proxy implementation requires an empty vector where we could specify constructor arguments (if this were an abstract class instead of an interface).
The arg list for all methods of reify will specify the reified instance as the first argument. In our example the Collection.add method only takes one argument, but in our reify we also get the instance of the collection.

You might have also noticed that both implementations of add have "true" at the end - in our example we're hard-coding the return value of add to always return true. The following output is the result of running the current example code.

this string is printed on proxied.add
this string is printed on reified.add
#<core$reify_coll$reify__11 interop.core$reify_coll$reify__11@556917ee>

It's worth reading the docs to determine whether you want proxy or reify; however, if you don't see a clear choice I would opt for reify.

Returning objects from Clojure to Java
Our current Example.java returns something from the call to invoke on the clojure.lang.Var that is returned from RT.var("interop.core", "main"), but we're ignoring it so we have no idea what's returned.* Let's change the code and return something on purpose.

// interop/Example.java
package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        RT.loadResourceScript("interop/core.clj");
        System.out.println(RT.var("interop.core", "main").invoke());
    }
}

; interop/core.clj
(ns interop.core)

(defn main []
  {:a "1" :b "2"})

Running our changes produces the following output.

{:a "1", :b "2"}

commit

At this point we are back in Java land after making a quick trip to Clojure to get a value. Returning most objects will be pretty straightforward; however, at some point you may want to return a Clojure function. This turns out to be fairly easy as well, since Clojure functions are instances of the IFn interface. The following code demonstrates how to return a Clojure function and call it from within Java.

// interop/Example.java
package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        RT.loadResourceScript("interop/core.clj");
        clojure.lang.IFn f = (clojure.lang.IFn) RT.var("interop.core", "main").invoke();
        f.invoke("hello world");
    }
}

// interop/core.clj
(ns interop.core)

(defn main [] println)

commit

The above example returns the println function from interop.core/main and then invokes the println function from within Java. I only chose to pass one argument to invoke; however, the IFn.invoke method has various overrides to allow you to pass several arguments. The above code works, but it can be simplified to the following example.

package interop;

import clojure.lang.RT;

public class Example {
    public static void main(String[] args) throws Exception {
        clojure.lang.IFn f = (clojure.lang.IFn) RT.var("clojure.core", "println");
        f.invoke("hello world");
    }
}

commit

It seems like a fitting end that our final output is the same as our original output.

hello world

*actually, it's the last thing that's returned, or "true" for this specific case.

Tuesday, August 23, 2011

Clojure: Check For nil In a List

The every? function in Clojure is very helpful for determining if every element of a list passes a predicate. From the docs:

Usage: (every? pred coll)

Returns true if (pred x) is logical true for every x in coll, else false.

The usage of every? is very straightforward, but a quick REPL session is always nice to verify our assumptions.

Clojure 1.2.0

user=> (every? nil? [nil nil nil])

true

user=> (every? nil? [nil 1])      

false

As expected, every? works well when you know exactly what predicate you need to use.

Yesterday, I was working on some code that included checking for nil - similar to the example below.

user=> (def front 1)

#'user/front

user=> (def back 2)

#'user/back

user=> (when (and front back) [front back])        

[1 2]

This code works perfectly if you have the individual elements "front" and "back", but as the code evolved I ended up representing "front" and "back" simply as a list of elements. Changing to a list required a way to verify that each entry in the list was not nil.

I was 99% sure that "and" was a macro; therefore, combining it with apply wasn't an option. A quick REPL reference verified my suspicion.

user=> (def legs [front back])               

#'user/legs

user=> (when (apply and legs) legs)      

java.lang.Exception: Can't take value of a macro: #'clojure.core/and (NO_SOURCE_FILE:8)

Several other options came to mind, an anonymous function that checked for (not (nil? %)), map the values to (not (nil? %)) and use every? with true?; however, because of Clojure's truthiness the identity function is really all you need. The following REPL session shows how identity works perfectly as our predicate for this example.

(when (every? identity legs) legs)

[1 2]

For a few more looks at behavior, here's a few examples that include nil and false.

user=> (every? identity [1 2 3 4])

true

user=> (every? identity [1 2 nil 4])

false

user=> (every? identity [1 false 4])

false

As you can see, using identity will cause every? to fail if any element is falsey (nil or false). In my case the elements are integers or nil, so this works perfectly; however, it's worth noting so you don't see unexpected results if booleans ever end up in your list.

Clojure: partition-by, split-with, group-by, and juxt

Today I ran into a common situation: I needed to split a list into 2 sublists - elements that passed a predicate and elements that failed a predicate. I'm sure I've run into this problem several times, but it's been awhile and I'd forgotten what options were available to me. A quick look at http://clojure.github.com/clojure/ reveals several potential functions: partition-by, split-with, and group-by.

partition-by
From the docs:

Usage: (partition-by f coll)

Applies f to each value in coll, splitting it each time f returns
a new value. Returns a lazy seq of partitions.

Let's assume we have a collection of ints and we want to split them into a list of evens and a list of odds. The following REPL session shows the result of calling partition-by with our list of ints.

user=> (partition-by even? [1 2 4 3 5 6])

((1) (2 4) (3 5) (6))

The partition-by function works as described; unfortunately, it's not exactly what I'm looking for. I need a function that returns ((1 3 5) (2 4 6)).

split-with
From the docs:

Usage: (split-with pred coll)

Returns a vector of [(take-while pred coll) (drop-while pred coll)]

The split-with function sounds promising, but a quick REPL session shows it's not what we're looking for.

user=> (split-with even? [1 2 4 3 5 6])

[() (1 2 4 3 5 6)]

As the docs state, the collection is split on the first item that fails the predicate - (even? 1).

group-by
From the docs:

Usage: (group-by f coll)

Returns a map of the elements of coll keyed by the result of f on each element. The value at each key will be a vector of the corresponding elements, in the order they appeared in coll.

The group-by function works, but it gives us a bit more than we're looking for.

user=> (group-by even? [1 2 4 3 5 6])

{false [1 3 5], true [2 4 6]}

The result as a map isn't exactly what we desire, but using a bit of destructuring allows us to grab the values we're looking for.

user=> (let [{evens true odds false} (group-by even? [1 2 4 3 5 6])]

[evens odds])

[[2 4 6] [1 3 5]]

The group-by results mixed with destructuring do the trick, but there's another option.

juxt
From the docs:

Usage: (juxt f)
              (juxt f g)
              (juxt f g h)
              (juxt f g h & fs)

Alpha - name subject to change.
Takes a set of functions and returns a fn that is the juxtaposition
of those fns. The returned fn takes a variable number of args, and
returns a vector containing the result of applying each fn to the
args (left-to-right).
((juxt a b c) x) => [(a x) (b x) (c x)]

The first time I ran into juxt I found it a bit intimidating. I couldn't tell you why, but if you feel the same way - don't feel bad. It turns out, juxt is exactly what we're looking for. The following REPL session shows how to combine juxt with filter and remove to produce the desired results.

user=> ((juxt filter remove) even? [1 2 4 3 5 6])

[(2 4 6) (1 3 5)]

There's one catch to using juxt in this way, the entire list is processed with filter and remove. In general this is acceptable; however, it's something worth considering when writing performance sensitive code.

Monday, August 01, 2011

Clojure: memfn

The other day I stumbled upon Clojure's memfn macro.

The memfn macro expands into code that creates a fn that expects to be passed an object and any args and calls the named instance method on the object passing the args. Use when you want to treat a Java method as a first-class fn.
(map (memfn charAt i) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)
-- clojure.org

At first glance it appeared to be something nice, but even the documentation states that "...it is almost always preferable to do this directly now..." - with an anonymous function.

(map #(.charAt %1 %2) ["fred" "ethel" "lucy"] [1 2 3])
-> (\r \h \y)

-- clojure.org, again

I pondered memfn. If it's almost always preferable to use an anonymous function, when is it preferable to use memfn? Nothing came to mind, so I moved on and never really gave memfn another thought.

Then the day came where I needed to test some Clojure code that called some very ugly and complex Java.

In production we have an object that is created in Java and passed directly to Clojure. Interacting with this object is easy (in production); however, creating an instance of that class (while testing) is an entirely different task. My interaction with the instance is minimal, only one method call, but it's an important method call. It needs to work perfectly today and every day forward.

I tried to construct the object myself. I wanted to test my interaction with this object from Clojure, but creating an instance turned out to be quite a significant task. After failing to easily create an instance after 15 minutes I decided to see if memfn could provide a solution. I'd never actually used memfn, but the documentation seemed promising.

In order to verify the behavior I was looking for, all I'll I needed was a function that I could rebind to return an expected value. The memfn macro provided exactly what I needed.

As a (contrived) example, let's assume you want to create a new order with a sequence id generated by incrementAndGet on AtomicLong. In production you'll use an actual AtomicLong and you might see something like the example below.

(def sequence-generator (AtomicLong.))
(defn new-order []
  (hash-map :id (.incrementAndGet sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}

While that might be exactly what you need in production, it's generally preferable to use something more explicit while testing. I haven't found an easy way to rebind a Java method (.incrementAndGet in our example); however, if I use memfn I can create a first-class function that is easily rebound.

(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
  (hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => {:id 1}
(println (new-order)) ; => {:id 2}

At this point we can see that memfn is calling our AtomicLong and our results haven't been altered in anyway. The final example shows a version that uses binding to ensure that inc&get always returns 10.

(def sequence-generator (AtomicLong.))
(def inc&get (memfn incrementAndGet))
(defn new-order []
  (hash-map :id (inc&get sequence-generator)))

(println (new-order)) ; => 1
(println (new-order)) ; => 2
(binding [inc&get (fn [_] 10)]
  (println (new-order)) ; => 10
  (println (new-order))) ; => 10

With inc&get being constant, we can now easily test our new-order function.

Tuesday, January 11, 2011

Clojure: fnil

The fnil function was added to Clojure in version 1.2. The fnil function is a great addition that allows you to write code that works for all cases where an argument isn't nil, and handle the case where it is nil.

From the documentation: The fnil function takes a function f, and returns a function that calls f, replacing a nil first argument to f with a supplied value.

A simple example is working with + and nil.

user=> (+ nil 1)
java.lang.NullPointerException (NO_SOURCE_FILE:0)
user=> (def new+ (fnil + 0))
#'user/new+
user=> (new+ nil 1)
1

As you can see, the + function throws an exception if an argument is nil; however, we were easily able to create our own new+ function that handles the first argument being nil.

In isolation, it might be hard to see how this is valuable. However, once combined with high order functions it's easy to see the benefit.

Several months ago I wrote about composing functions and used the update-in function as my final example of what I thought was the best implementation. What I didn't address in the blog entry was how to handle the first update.

As you can see from the following code, the first update will fail if a default value isn't populated.

user=> (def current-score {})                          
#'user/current-score
user=> (defn update-score [current {:keys [country player score]}]
         (update-in current [country player] + score))
#'user/update-score
user=> (update-score current-score {:player "Paul Casey" :country :England :score -1})
java.lang.NullPointerException (NO_SOURCE_FILE:0)

At the time of the writing Clojure 1.2 was not production ready, and I used a definition of update-score that was much more verbose, but did handle nil.

user=> (defn update-score [current {:keys [country player score]}]                    
         (update-in current [country player] #(+ (or %1 0) score)))         
#'user/update-score
user=> (update-score current-score {:player "Paul Casey" :country :England :score -1})
{:England {"Paul Casey" -1}}

While the above code works perfectly well, it's obviously not nearly as nice to read as the example that doesn't need to concern itself with nil.

However, Clojure 1.2 is now production ready and the fnil function is available. As a result, you can now write the following version of the update-score function that is obviously preferable to the version that uses the or function.

user=> (defn update-score [current {:keys [country player score]}]                    
         (update-in current [country player] (fnil + 0) score))                
#'user/update-score
user=> (update-score current-score {:player "Paul Casey" :country :England :score -1})
{:England {"Paul Casey" -1}}

I'll admit that fnil wasn't my favorite function when I first found it; however, it's become indispensable. Looking through my code I find (fnil + 0) and (fnil - 0) a few times, and I definitely prefer those to the versions that use the or function.

Thursday, January 06, 2011

Clojure: partial and comp

Clojure provides a few different options for creating functions inline: fn (or the #() reader macro), partial, and comp. When I first got started with Clojure I found I could do everything with fn and #(); and that's a good place to start. However, as I produced more Clojure code I found there were also opportunities to use both partial and comp to create more concise code.

The following examples are contrived. They will show how partial and comp can be used; however, they aren't great examples of when they should be used. As always, context is important, and you'll need to decide when (or if) you want to use either function.

The partial function takes a function and fewer than the normal arguments to the function, and returns a function that takes a variable number of additional args. When called, the returned function calls the function with the specified args and any additional args.

The partial function can often be used as an alternative to fn or #(). The following example shows how you can use either #() or partial to multiply a list of integers by .01 (convert pennies to dollars).

user=> (map #(* 0.01 %1) [5000 100 50])  

(50.0 1.0 0.5)

user=> (map (partial * 0.01) [5000 100 50])

(50.0 1.0 0.5)

In a straightforward example, such as the one above, it's really up to you which you'd prefer. However, if you want to specify a predicate that takes a variable number of args then the case for partial starts to become a bit more noticeable.

user=> (map #(apply str "price & tip: " %&) [5000 100 50] (repeat "+") [2000 40 10])

("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

user=> (map (partial str "price & tip: ") [5000 100 50] (repeat "+") [2000 40 10])  

("price & tip: 5000+2000" "price & tip: 100+40" "price & tip: 50+10")

I haven't come across an example yet that made me think: The partial function is definitely the right choice here! However, I have passed around a few functions that had several arguments and found I prefer (partial f arg1) to #(f arg1 %1 %2 %3) and #(apply f arg1 %&).

The comp function takes a variable number of functions and returns a function that is the composition of those functions. The returned function takes a variable number of args, applies the rightmost of functions to the args, the next function (right-to-left) to the result, etc.

In a previous blog entry I used comp to return the values from a map given a list of keys. Below you can find the same example that shows the definition and usage.

user=> (def select-values (comp vals select-keys))

#'user/select-values

user=> (select-values {:a 1 :b 2} [:a])           

(1)

As you can see from the example, comp creates a function that takes a map and a sequence of keys, calls select-keys with that sequence, then calls vals with the result of calling select-keys. As the documentation specifies, the functions are called from right to left.

In general I find myself executing some functions directly with some data. In that case I generally use the -> macro. For example, if I already have a map and I want a list of the keys I'm probably going to write code similar to what's found below.

user=> (-> {:a 1 :b 2} (select-keys [:a]) vals)

(1)

However, there are times when you need a function that is the composition of a few other functions. For example, taking a list of numbers, converting them to strings, and then converting them to keywords.

user=> (map (comp keyword str) [1 2])

(:1 :2)

The same thing can be done with #(), as the example below shows.

user=> (map #(keyword (str %1)) [1 2])

(:1 :2)

While the code above works perfectly well, I definitely prefer the version that uses the comp function.

Like so many other functions in Clojure, you can get by without partial and comp. However, I find my code more readable and maintainable when I use tools specifically designed to handle my current problem.

Wednesday, January 05, 2011

Clojure: select-keys, select-values, and apply-values

Clojure provides the get and get-in functions for returning values from a map and the select-keys function for returning a new map of only the specified keys. Clojure doesn't provide a function that returns a list of values; however, it's very easy to create such a function (which I call select-values). Once you have the ability to select-values it becomes very easy to create a function that applies a function to the selected values (which I call apply-values).

The select-keys function returns a map containing only the entries of the specified keys. The following (pasted) REPL session shows a few different select-keys behaviors.

user=> (select-keys {:a 1 :b 2} [:a])   
{:a 1}
user=> (select-keys {:a 1 :b 2} [:a :b])
{:b 2, :a 1}
user=> (select-keys {:a 1 :b 2} [:a :b :c])
{:b 2, :a 1}
user=> (select-keys {:a 1 :b 2} [])  
{}
user=> (select-keys {:a 1 :b 2} nil)
{}

The select-keys function is helpful in many occassions; however, sometimes you only care about selecting the values of certain keys in a map. A simple solution is to call select-keys and then vals. Below you can find the results of applying this idea.

user=> (def select-values (comp vals select-keys))
#'user/select-values
user=> (select-values {:a 1 :b 2} [:a])           
(1)
user=> (select-values {:a 1 :b 2} [:a :b])        
(2 1)
user=> (select-values {:a 1 :b 2} [:a :b :c])     
(2 1)
user=> (select-values {:a 1 :b 2} [])             
nil
user=> (select-values {:a 1 :b 2} nil)
nil

The select-values implementation from above may be sufficient for what you are doing, but there are two things worth noticing: in cases where you might be expecting an empty list you are seeing nil; and, the values are not in the same order that the keys were specified in. Given that (standard) maps are unsorted, you can't be sure of the ordering the values.

(side-note: If you are concerned with microseconds, it's also been reported that select-keys is a bit slow/garbage heavy.)

An alternative definition of select-values uses the reduce function and pulls the values by key and incrementally builds the (vector) result.

user=> (defn select-values [map ks]
         (reduce #(conj %1 (map %2)) [] ks))
#'user/select-values
user=> (select-values {:a 1 :b 2} [:a])      
[1]
user=> (select-values {:a 1 :b 2} [:a :b])   
[1 2]
user=> (select-values {:a 1 :b 2} [:a :b :c])
[1 2 nil]
user=> (select-values {:a 1 :b 2} [])        
[]
user=> (select-values {:a 1 :b 2} nil)       
[]

The new select-values function returns the values in order and returns an empty vector in the cases where previous examples returned nil, but we have a new problem: Keys specified that don't exist in the map are now included in the vector as nil. This issue is easily addressed by adding a call to the remove function.

The implementation that includes removing nils can be found below.

user=> (defn select-values [map ks]
         (remove nil? (reduce #(conj %1 (map %2)) [] ks)))
#'user/select-values
user=> (select-values {:a 1 :b 2} [:a])                          
(1)
user=> (select-values {:a 1 :b 2} [:a :b])                       
(1 2)
user=> (select-values {:a 1 :b 2} [:a :b :c])                    
(1 2)
user=> (select-values {:a 1 :b 2} [])                            
()
user=> (select-values {:a 1 :b 2} nil)                           
()

There is no "correct" implementation for select-values. If you don't care about ordering and nil is a reasonable return value: the first implementation is the correct choice due to it's concise definition. If you do care about ordering and performance: the second implementation might be the right choice. If you want something that follows the principle of least surprise: the third implementation is probably the right choice. You'll have to decide what's best for your context. In fact, here's a few more implementations that might be better based on your context.

user=> (defn select-values [m ks] 
         (map m ks))             
#'user/select-values
user=> (select-values {:a 1 :b 2} [:a])                                                                                     
(1)
user=> (select-values {:a 1 :b 2} [:a :b :c])                                                                               
(1 2 nil)
user=> (defn select-values [m ks] 
         (reduce #(if-let [v (m %2)] (conj %1 v) %1) [] ks))
#'user/select-values
user=> (select-values {:a 1 :b 2} [:a])                                                        
[1]
user=> (select-values {:a 1 :b 2} [:a :b :c])                                                  
[1 2]

Pulling values from a map is helpful, but it's generally not the end goal. If you find yourself pulling values from a map, it's likely that you're going to want to apply a function to the extracted values. With that in mind, I generally define an apply-values function that returns the result of applying a function to the values returned from specified keys.

A good example of this is returning the total for a line item represented as a map. Given a map that specifies a line item costing $5 and having a quantity of 4, you can use (* price quantity) to determine the total price for the line item.

Using our previously defined select-values function we can do the work ourselves, as the example below shows.

user=> (let [[price quantity] (select-values {:price 5 :quantity 4 :upc 1123} [:price :quantity])]                          
         (* price quantity))
20

The example above works perfectly well; however, applying a function to the values of a map seems like a fairly generic operation that can easily be extracted to it's own function (the apply-values function). The example below shows the definition and usage of my definition of apply-values.

user=> (defn apply-values [map f & ks]          
         (apply f (select-values map ks)))
#'user/apply-values
user=> (apply-values {:price 5 :quantity 4 :upc 1123} * :price :quantity)
20

I find select-keys, select-values, & apply-values to be helpful when writing Clojure applications. If you find you need these functions, feel free to use them in your own code. However, you'll probably want to check the comments - I'm sure someone with more Clojure experience than I have will provide superior implementations.

Monday, December 06, 2010

Clojure: get, get-in, contains?, and some

Clojure provides a get function that returns the value mapped to a key in a set or map. The documentation shows the example: (get map key). While that's completely valid, I tend to use sets and maps as functions when the get is that simple.

For example, I'd use ({"FSU" 31 "UF" 7} "FSU") if I wanted the value of the key "FSU". It's much less likely that I'd use (get {"FSU" 31 "UF" 7} "FSU"), largely because the former example is less typing.

However, if I'm doing something more complicated I've found the get function to be helpful. Often, I like to use the combination of get and -> or ->>.

The following example takes some json-data, converts it to a clojure map, and pulls the value from the "FSU" key.

(-> json-data read-json (get "FSU"))

It's also worth noting, in our example we have to use get, since strings are not Clojure functions. If instead we chose to make our keys keywords, we could choose either of the following solutions. I don't believe there is a right or wrong solution; which you use will likely be a personal preference.

(-> json-data (read-json true) (get :FSU))
(-> json-data (read-json true) :FSU)

We can modify the example and assume nested json that results in the following clojure map: {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}}

Building off of a previous example, we could use a slightly modified version to get the score for FSU.

(-> json-data read-json (get "scores") (get "FSU"))

However, getting nested values is common enough that Clojure provides a function designed specifically to address that need: get-in

The get-in function returns the value in a nested associative structure when given a sequence of keys. Using get-in you can replace the last example with the following code.

(-> json-data read-json (get-in ["scores" "FSU"]))

The get-in function is very helpful when dealing with nested structures; however, there is one gotcha that I've run into. The following shows a REPL session and what get-in returns with various keys.

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores" "FSU"])
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} ["scores"])      
{"FSU" 31, "UF" 7}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} [])        
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} nil)
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

Everything looks logical enough; however, if you are pulling your key sequence from somewhere else you could end up with unexpected results. The following example shows how a simple mistake could result in a bug.

user=> (def score-key-seqs {"FSU" ["scores" "FSU"]})                             
#'user/score-key-seqs

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "FSU"))
31

user=> (get-in {"timestamp" 1291578985220 "scores" {"FSU" 31 "UF" 7}} (score-key-seqs "UF")) 
{"timestamp" 1291578985220, "scores" {"FSU" 31, "UF" 7}}

If you're always expecting a number and you get a map instead, things might not work out well.

It's also worth noting that both get and get-in allow you to specify default values. You can check the documentation on clojure.org for more information on default values.

You don't always need to get a value, sometimes it's good enough to know that a key is in a map or set. In general I use the value returned from a map or set to determine if a key exists - the following snippet uses that pattern.

(if (a-map :key) 
  (do-true-behaviors) 
  (do-false-behaviors))

However, that pattern fails if the value of :key is nil. If it's possible that the value might be nil you might want to use Clojure's contains? function. The contains? function returns true if key is present in the given collection, otherwise returns false. The following code pasted from a REPL session demonstrates that contains? works perfectly well with nil.

user=> (contains? {:foo nil} :foo)
true

The contains? function works well with sets and maps; however, if you try to use it on a vector you might get surprising results.

user=> (contains? [1 3 4] 2)
true

For numerically indexed collections like vectors and Java arrays, the contains? function tests if the numeric key is within the range of indexes. The Clojure documentation recommends looking at the some function if you're looking for an item in a list.

The some function returns the first logical true value of a predicate for any item in the list, else nil. The following REPL session shows how you can use a set as the predicate with some to determine if a value is found in a list.

user=> (some #{2} [1 3 4])  
nil
user=> (some #{1} [1 3 4])
1

Clojure provides various functions for operating on maps and sets. At first glance some of them may look superfluous; however, as you spend more time working with sets and maps you'll start to appreciate the subtle differences and the value they provide.