Tuesday, September 25, 2012

Replacing Common Code With clojure.set Function Calls

If you've written a fair amount of Clojure code and aren't familiar with clojure.set, then chances are you've probably reinvented a few functions that are already available in the standard library. In this blog post I'll give a few examples of commonly written code, and I'll show the clojure.set functions that already do everything you need.


Removing elements from a collection is a very common programming task. Sometimes the collection will need to be a vector or a list, and removing an element from the collection will look similar to the example below.
user=> (remove #{1 2} [1 2 3 4 3 2 1])
(3 4 3)
In the cases where you're starting with a list and you want to return a seq, remove is a good solution. However, you may also find yourself starting with a set or looking to return a set.

If you're starting with sets, you'll probably get a performance gain by using clojure.set/difference, and if you're going to need a set returned it's less code and likely more performant to use clojure.set/difference rather than calling clojure.core/set on the results of clojure.core/remove.

clojure.set/difference is simple to use - from the docs
Usage: (difference s1)
       (difference s1 s2)
       (difference s1 s2 & sets)
Return a set that is the first set without elements of the remaining sets
A simple example of using clojure.set/difference can be found below.
user=> (clojure.set/difference #{1 2 3 4 5} #{1 2} #{3})
#{4 5}


Transforming data in clojure is something I do very often. On many occasions I've had a list of maps and I wanted them indexed by 1 or more values. This is fairly easy to do with reduce and update-in, as the example below demonstrates.
user=> (def jay {:name "jay fields" :employer "drw"})
#'user/jay
user=> (def mike {:name "mike jones" :employer "forward"})
#'user/mike
user=> (def john {:name "john dydo" :employer "drw"})
#'user/john
user=> (reduce #(update-in %1 [{:employer (:employer %2)}] conj %2) {} [jay mike john])
{{:employer "forward"} ({:name "mike jones", :employer "forward"}), 
 {:employer "drw"} ({:name "john dydo", :employer "drw"} 
                    {:name "jay fields", :employer "drw"})}
The reduce + update-in combo is a good one, but clojure.set/index is even better - since it's both more concise and doesn't require you to define an anonymous function. clojure.set/index is also very straightforward to use - from the docs
Usage: (index xrel ks)
Returns a map of the distinct values of ks in the xrel mapped to a set 
        of the maps in xrel with the corresponding values of ks.
The example below demonstrates how you can get very similar results to what is above by using clojure.set/index.
user=> (clojure.set/index [jay mike john] [:employer])
{{:employer "forward"} #{{:name "mike jones", :employer "forward"}}, 
 {:employer "drw"} #{{:name "john dydo", :employer "drw"} 
                     {:name "jay fields", :employer "drw"}}}
It is worth noting that the reduce + update-in example has seqs as values and can contain duplicates, and the clojure.set/index example has sets as values and will not contain duplicates. In practice, this has never been an issue for me.


Another common case while working with collections is finding the elements that are in both collections. Since sets are functions (and can be used a predicates), finding common elements is as simple as the following clojure.
user=> (filter (set [1 2 3]) [2 3 4])
(2 3)
Similar to the clojure.set/difference example, if you have lists or vectors in and you want a seq out, you may want to stick to using filter. However, if you are already working with sets or you can easily convert to sets, you'll probably want to take a look at clojure.set/intersection.
Usage: (intersection s1)
       (intersection s1 s2)
       (intersection s1 s2 & sets)
Return a set that is the intersection of the input sets
To get results similar to the above example, simply call clojure.set/intersection in a similar way to the example below.
user=> (clojure.set/intersection #{1 2 3} #{2 3 4})
#{2 3}


In a codebase I was once working on I stumbled upon the following code, which inverts a map.
user=> (reduce #(assoc %1 (val %2) (key %2)) {} {1 :one 2 :two 3 :three})
{:three 3, :two 2, :one 1}
The code is simple enough, but a single function call is always preferable.
Usage: (map-invert m)
Returns the map with the vals mapped to the keys.
The name of the function should be self-explanatory; however, an example is presented below for completeness.
user=> (clojure.set/map-invert {1 :one 2 :two 3 :three})
{:three 3, :two 2, :one 1}


Another common task I find myself doing while working with clojure is trimming data sets. The following code maps over a list of employees and filters out the employer information.
user=> (def jay {:fname "jay" :lname "fields" :employer "drw"})
#'user/jay
user=> (def mike {:fname "mike" :lname "jones" :employer "forward"})
#'user/mike
user=> (def john {:fname "john" :lname "dydo" :employer "drw"})
#'user/john
user=> (map #(select-keys %1 [:fname :lname]) [jay mike john])
({:lname "fields", :fname "jay"} 
 {:lname "jones", :fname "mike"} 
 {:lname "dydo", :fname "john"})
The combination of map + select-keys gets the job done, but clojure.set gives us with one function, clojure.set/project, that provides us with virtually the same result - using less code.
Usage: (project xrel ks)
Returns a rel of the elements of xrel with only the keys in ks
The example below demonstrates the similarity in functionality.
user=> (clojure.set/project [jay mike john] [:fname :lname])
#{{:lname "fields", :fname "jay"} 
  {:lname "dydo", :fname "john"} 
  {:lname "jones", :fname "mike"}}
Similar to clojure.set/index, you'll want to take note of the result being a set and not a list, and just like clojure.set/index, this isn't something that ends up causing a problem in practice.


The rename and rename-keys functions of clojure.set are very similar, and they can both be helpful when you're passing around data-structures that are similar and simply require a few renames to play nicely with existing code.

Below are a few simple examples of how to get things done without rename and rename-keys.
user=> (def jay {:fname "jay" :lname "fields" :employer "drw"})
#'user/jay
user=> (def mike {:fname "mike" :lname "jones" :employer "forward"})
#'user/mike
user=> (def john {:fname "john" :lname "dydo" :employer "drw"})
#'user/john
user=> (map 
         (fn [{:keys [fname lname] :as m}] 
             (-> m 
                 (assoc :first-name fname :last-name lname) 
                 (dissoc :fname :lname))) 
         [jay mike john])
({:last-name "fields", :first-name "jay", :employer "drw"} 
 {:last-name "jones", :first-name "mike", :employer "forward"} 
 {:last-name "dydo", :first-name "john", :employer "drw"})

user=> (reduce #(assoc %1 ({1 "one" 2 "two"} (key %2)) (val %2)) {} {1 :one 2 :two})
{"two" :two, "one" :one}
The rename & rename-keys functions are very straightforward, and you can find their documentation and example usages below.
Usage: (rename xrel kmap)
Returns a rel of the maps in xrel with the keys in kmap renamed to the vals in kmap

Usage: (rename-keys map kmap)
Returns the map with the keys in kmap renamed to the vals in kmap
user=> (clojure.set/rename [jay mike john] {:fname :first-name :lname :last-name})
#{{:last-name "jones", :first-name "mike", :employer "forward"} 
  {:last-name "dydo", :first-name "john", :employer "drw"} 
  {:last-name "fields", :first-name "jay", :employer "drw"}}

user=> (clojure.set/rename-keys {1 :one 2 :two} {1 "one" 2 "two"})
{"two" :two, "one" :one}


If you've gotten this far, I'll assume you already understand how to use filter. The clojure.set namespace has a function that's very similar to filter, but it returns a set. If you don't need a set, you're better off sticking with filter; however, if you're working with sets, you might save yourself a few keystrokes and microseconds by using clojure.set/select instead.

Below is a the documentation and an example.
Usage: (select pred xset)
Returns a set of the elements for which pred is true
user=> (clojure.set/select odd? #{1 2 3 4})
#{1 3}


The clojure.set/subset? and clojure.set/superset? functions are also functions that are straightforward to use, and probably don't benefit from an example of how to create the same results on your own. However, I will provide the docs and 2 brief examples of their usage.
Usage: (subset? set1 set2)
Is set1 a subset of set2?

Usage: (superset? set1 set2)
Is set1 a superset of set2?
user=> (clojure.set/superset? #{1 2 3} #{2 3})
true
user=> (clojure.set/subset? #{1 2} #{1 2 3})
true


The final function I will document is clojure.set/union. If you needed a list of the unique elements resulting from combining 2 or more lists, you could get the job done with a combination of concat, reduce, and/or set. The example below shows how to do things without using the set function or a set data-structure. note: Using a set would likely be both more efficient and more readable. This example is designed to show that you could do things without sets, but I do not recommend that you code in this way.
(reduce 
  #(if (some (partial = %2) %1) %1 (conj %1 %2)) 
  [] 
  (concat [1 2 1] [2 4 3 1])) 
[1 2 4 3]
Truthfully, I don't tend to think about 'union' unless I'm already thinking about sets. In Clojure, clojure.set/union is defined to take multiple sets and return the union of each of those sets (as you'd expect).
Usage: (union)
       (union s1)
       (union s1 s2)
       (union s1 s2 & sets)
Return a set that is the union of the input sets
Finally, the example below shows the union function in action.
user=> (clojure.set/union #{1 2} #{2 4 3 1})
#{1 2 3 4}


The clojure.set namespace does define one additional function, clojure.set/join. To be honest, I haven't used join in production and I don't believe that I'm writing my own inferior versions within my codebases. So, I don't have an example for you, but I do like the examples on clojuredocs.org and I would encourage you to go check them out: http://clojuredocs.org/clojure_core/1.2.0/clojure.set/join

2 comments:

  1. Hi Jay. Great post (as always), one quick note-

    You should prefer:
    (keep (set [2 3]) [1 2 3 4 5]) => (2 3)

    over filter:
    (filter (set [2 false]) [1 2 false 4 5]) => (2)

    ... when looking explicitly for non-nil returns (not falsey items).

    ReplyDelete
  2. Thank you very much for this post.

    ReplyDelete

Note: Only a member of this blog may post a comment.