Jay Fields' Thoughts: May 2013

Thursday, May 16, 2013

Clojure: Combining Calls To Doseq And Let

I've you've ever looked at the docs for clojure's for macro, then you probably know about the :let, :when, and :while modifiers. What you may not know is that those same modifiers are available in doseq.

I was recently working with some code that had the following form.

Upon seeing this code, John Hume asked if I preferred it to a single doseq with multiple bindings. He sent over an example that looked similar to the following example.

That was actually the first time that I'd seen multiple bindings in a doseq, and my immediate reaction was that I preferred the explicit simplicity of having multiple doseqs. However, I always have a preference for concise code, and I forced myself to starting using multiple bindings instead of multiple doseqs - and, unsurprisingly, I now prefer multiple bindings to multiple doseqs.

You might have noticed that the second version of the code slightly changes what's actually being done. In the original version the 'name' function is called once per 'id', and in the second version the 'name' function is called once per 'sub-id'. Calling name significantly more often isn't likely to have much impact on your program; however, if you were calling a more expensive function this change could have a negative impact. Luckily, (as I previously mentioned) doseq also provides support for :let.

The second example can be evolved to the following code - which also demonstrates that the let is only evaluated once per iteration.

That's really the final version of the original code, but you can alter it slightly for experimentation purposes if you'd like. Let's assume we have another function we're calling in an additional let and it's expensive, it would be nice if that only occurred when an iteration was going to happen. It turns out, that's exactly what happens.

Whether you prefer multiple bindings or multiple doseqs, it's probably a good idea to get comfortable reading both.

Wednesday, May 15, 2013

Emacs Lisp: Font Lock for Clojure's Partial

I love using partial, but I dislike the length of the function name. There's a simple solution, define another function with a shorter name that simply calls (or is) partial. This is exactly what I did in the jry library.

I liked the use of % due to partial feeling similar to creating a function using #(), and % having a special meaning inside #(). I thought they tied well together. Unfortunately, there's an obvious problem, things would be very broken if you tried to use the '%' function in an anonymous function defined with #(). Somewhere along the way this issue caused me to stop using jry/%.

Using partial is great: it's part of the standard lib, and I don't need to explain it to anyone who joins my team or any future maintainers of the code I write. Still, I want something shorter, and I've always had a background thread looking for another shorter-than-partial solution. While recently contributing to emacs-live I found the solution I was looking for: clojure-mode font lock.

The following code can now be found in my emacs configuration.

This solution feels like the best of both worlds. My code still uses the function from the standard library, my colleagues still see a function they already know, and 'partial' only takes up one character space in my buffer. The image below is what you'll see if you put the above emacs-lisp in your config.

Tuesday, May 14, 2013

Clojure: Testing The Creation Of A Partial Function

I recently refactored some code that takes longs from two different sources to compute one value. The code originally stored the longs and called a function when all of the data arrived. The refactored version partials the data while it's incomplete and executes the partial'd function when all of the data is available. Below is a contrived example of what I'm taking about.

Let's pretend we need a function that will allow us to check whether or not another drink would make us legally drunk in New York City.

The code below stores the current bac and uses the value when legally-drunk? is called.

The following (passing) tests demonstrate that everything works as expected.

This code works without issue, but can also be refactored to store a partial'd function instead of the bac value. Why you would want to do such a thing is outside of the scope of this post, so we'll just assume this is a good refactoring. The code below no longer stores the bac value, and instead stores the pure-legally-drunk? function partial'd with the bac value.

Two of the three of the tests don't change; however, the test that was verifying the state is now broken.

note: The test output has been trimmed and reformatted to avoid horizontal scrolling.

In the output you can see that the test is failing as you'd expect, due to the change in what we're storing. What's broken is obvious, but there's not an obvious solution. Assuming you still want this state based test, how do you verify that you've partial'd the right function with the right value?

The solution is simple, but a bit tricky. As long as you don't find the redef too magical, the following solution allows you to easily verify the function that's being partial'd as well as the arguments.

Those tests all pass, and should provide security that the legally-drunk? and update-bac functions are sufficiently tested. The pure-legally-drunk? function still needs to be tested, but that should be easy since it's a pure function.

Would you want this kind of test? I think that becomes a matter of context and personal preference. Given the various paths through the code the following tests should provide complete coverage.

The above tests make no assumptions about the implementation - they actually pass whether you :use the 'original namespace or the 'refactored namespace. Conversely, the following tests verify each function in isolation and a few of them are very much tied to the implementation.

Both sets of tests would give me confidence that the code works as expected, so choosing which tests to use would become a matter of maintenance cost. I don't think there's anything special about these examples; I think they offer the traditional trade-offs between higher and lower level tests. A specific trade-off that stands out to me is identifying defect localization versus having to update the test when you update the code.

As I mentioned previously, the high-level-expectations work for both the 'original and the 'refactored namespaces. Being able to change the implementation without having to change the test is obviously an advantage of the high level tests. However, when things go wrong, the lower level tests provide better feedback for targeting the issue.

The following code is exactly the same as the code in refactored.clj, except it has a 1 character typo. (it's not necessary to spot the typo, the test output below will show you want it is)

The high level tests give us the following feedback.

failure in (high_level_expectations.clj:14) : expectations.high-level-expectations
(expect
 true
 (with-redefs
  [state (atom {})]
  (update-bac 0.01)
  (legally-drunk? 0.07)))

           expected: true 
                was: false

There's not much in that failure report to point us in the right direction. The unit-level-expectations provide significantly more information, and the details that should make it immediately obvious where the typo is.

failure in (unit_level_expectations.clj:8) : expectations.unit-level-expectations
(expect
 {:legally-drunk?* [pure-legally-drunk? 0.04]}
 (with-redefs [state (atom {}) partial vector] (update-bac 0.04)))

           expected: {:legally-drunk?* [# 0.04]} 
                was: {:legally-drunk?** [# 0.04]}
 
           :legally-drunk?** with val [# 0.04] 
                             is in actual, but not in expected
           :legally-drunk?* with val [# 0.04] 
                            is in expected, but not in actual

The above output points us directly to the extra asterisk in update-bac that caused the failure.

Still, I couldn't honestly tell you which of the above tests that I prefer. This specific example provides a situation where I think you could convincingly argue for either set of tests. However, as the code evolved I would likely choose one path or the other based on:

how much 'setup' is required for always using high-level tests?
how hard is it to guarantee integration using primarily unit-level tests?

In our examples the high level tests require redef'ing one bit of state. If that grew to a few pieces of state and/or a large increase in the complexity of the state, then I may be forced to move towards more unit-level tests. A rule of thumb I use: If a significant amount of the code within a test is setting up the test context, there's probably a smaller function and a set of associated tests waiting to be extracted.

By definition, the unit-level tests don't test the integration of the various functions. When I'm using unit-level tests, I'll often test the various code paths at the unit level and then have a happy-path high-level test that verifies integration of the various functions. My desire to have more high-level tests increases as the integration complexity increases, and at some point it makes sense to simply convert all of the tests to high-level tests.

If you constantly re-evaluate which tests will be more appropriate and switch when necessary, you'll definitely come out ahead in the long run.

Tuesday, May 07, 2013

Recovering Lost Post Data

I recently typed out a long, thoughtful response in a textarea. I clicked submit, like I've done millions of times, and I got the dreaded "session expired" error message. This happens very, very rarely, but it's devastating when it does. Creating long & thoughtful responses isn't something that comes naturally for me. I crossed my fingers and clicked back. No luck, web 2.0 dynamically created text boxes ensured Chrome had no chance to preserve my editing state.

My first reaction was: I guess I'm not responding after all. Then it occurred to me, DevTools must have my data somewhere, right? Lucky for me, the answer was yes.

There might be easier ways, this is what worked for me:

open DevTools
go to the "Network" tab.
look for the row with the method POST.
- If you don't see a POST row, try refreshing the page. With any luck you'll get a repost confirmation dialog, giving you some hope that your data is still around. (You'll want to allow the data repost)
click on the POST row, and scroll down till you see "Form Data". If you've gotten this far, hopefully you'll find your data in clear text and able to be copied.

The examples from this post are from following the instructions above and logging in to twitter.com. If you've ever lost post data in the past, you may want to give these directions a dry-run now.

Thursday, May 02, 2013

Emacs Lisp: Toggle Between a Clojure String and Keyword

When I was doing a fair bit of Ruby I often used the TextMate's shortcut (Ctrl+:) to convert a Ruby String to a Symbol or a Ruby Symbol to a String. It's something I've periodically missed while doing Clojure, and yesterday I found myself in the middle of a refactoring that was going to force the conversion of 5+ Clojure Keywords to Strings.

The following emacs lisp is my solution for toggling between Clojure Strings and Keywords. The standard disclaimers apply - it works on my machine, and I've never claimed to know emacs lisp well.

A quick video of the behavior: