Sunday, March 30, 2008
Domain Specific Language Simplexity
There's one kind of simplicity that I like to call simplexity. When you take something incredibly complex and try to wrap it in something simpler, you often just shroud the complexity -- Anders HejlsbergAt QCon London I caught the Domain Specific Language (DSL) tutorial by Martin Fowler, Neal Ford, and Rebecca Parsons. While Martin covered how and why you would want to create a DSL he discussed hiding complexity and designing solutions for specific problems. As Martin went into further detail all I could think was: Simplexity.
Anders prefers simple all the way down. Of course, hopefully we all prefer simplicity; however, the devil generally lives in the details.
Some things are complex by nature. Designing a ORM is a great example of a problem that I've yet to see a bullet proof solution for. Most ORMs are complex by necessity, but hiding that complexity in a DSL specific to your application is a good thing. I'll take simplexity over complexity pretty much any day.
I've designed several simplex Domain Specific Languages.
Domain Specific Flow Language
About a year and a half ago Tim Cochran and I designed a few objects that allowed you to define the flow of an application using code similar to the example below.
pages :customer, :shipping, :billing, :summary, :confirmation, :offer, :retailer
flow.customer_driven customer >> shipping >> billing >> summary >> confirmation
flow.retailer_driven retailer >> billing >> summary >> confirmation
flow.offer_driven offer >> shipping >> billing >> summary >> confirmationThe business led us to believe that the flow of the application was going to change often and it needed to be pliable. With that in mind Tim and I set out to create something that was simple to alter and didn't require complete understanding. We wrote out the syntax first and then made it execute. It was fairly easy for us to follow and we had a solution within a few hours. Then we presented the solution to the team.
We showed how to use the DSL and everyone loved it. I think it was actually most people's favorite part of the codebase. Then we dove into how it worked. I'm fairly sure that no one understood the underlying code. The solution was very simplex. It didn't really matter at the time because it was well tested and I don't remember it ever changing in the 6 months I was on the project.
About a year later Jake Scruggs wrote about how the current version of the Flow code is unnecessarily complex. I'm sure Jake is right. If the flows are rarely changing and the underlying code is causing any problems, it should definitely be removed.
The situation brings about an interesting question: how do you know when it's worth creating a DSL? In this example it was a requirement that it be easy to change a flow, but what is easy? I think the original implementation is easier to read and change, but much harder to fully understand and redesign. Unfortunately, there's no simple answer. Like so many things in software development, whether or not to introduce a DSL really depends on the context.
In this case, Tim and I decided to use a DSL because we expected the frequency of change to be high and there was a requirement that the time to make a change needed to be short. Given those requirements, something that is easy to read and change, but hides the underlying complexity is probably a good solution.
Expectations Delegation
I recently created a testing framework: expectations. One of the features of expectations is a solution for testing delegation. Delegation is such a commonly used pattern that I found it helpful to have an simple way to test that delegation is happening as expected.
The following code verifies that delegation does occur to the property that you expect and the result of the method you delegate to is returned from the delegating method.
expect PersonProxy.new.to.delegate(:name).to(:subject) do |proxy|
proxy.name
endIn the example, the expectation ensures that the proxy.name method calls proxy.subject.name and the return value from proxy.subject.name is also returned by proxy.name.
The resulting expectation is very easy to read and write, but the underlying code is actually quite complex. First of all you need to stub the proxy.subject method. The return value of proxy.subject needs to be a mock that expects a name method call. If the mock receives the name method call you know that delegation happened correctly. The behavior based testing isn't overly complex, but it's not simple either, and it's the easy bit.
Ensuring that the result of proxy.subject.name is returned from proxy.subject is much more complicated. Part of the problem is that the call to proxy.name happens within the block. You could use the result of the block and compare it to what proxy.subject.name is supposed to return, but then the following code would fail.
expect PersonProxy.new.to.delegate(:name).to(:subject) do |proxy|
proxy.name
nil
endYou could argue that the above code should cause a failure, but if it does it creates some less than desirable results. First of all, the error message will need to stay something like "Delegation may have occurred, but the return value of the block returned something different so delegation cannot be verified". Also, comparing the result of the block for state based tests makes sense, but here we are only testing delegation so it's not intuitive that the result of the block is significant.
I solved this by specifying the return value of proxy.subject.name, dynamically defining a module that defines the name method, captures the result of the delegation and stores it for later verification, and then extending the proxy with the dynamically defined module. If that's a lot to take in, don't worry I've yet to meet anyone thought it was easy to follow.
Underlying complexity is not desirable. If I could find an easier way to test delegation, I would absolutely change my implementation. Unfortunately, no one has been able to come up with a better solution, yet.
I do think the current solution is better than having to write traditional repetitive and overly verbose code to test every delegation. The DSL for testing delegation allows you to focus on higher value problems. Whether you follow the underlying implementation or not, you must admit that being able to test delegation in a readable and reliable way is a good thing.
Sometimes, complexity is warranted when the resulting interface is simple enough that you gain productivity overall.
Conveying Intention and Removing Noise
Part of what introduces simplexity to a Domain Specific Language is the need to write what you want without introducing side effects. In the expectations example, the return block of the expectation shouldn't be used for comparison because it's not immediately obvious that it would be significant. Using the block return value for comparison is easier from an implementation perspective, but it makes the DSL less usable. This is not a good trade-off.
Noise is defined differently by everyone, but I like to think of it as the difference between what I would like to write and what I need to write.
Removing noise is also a common cause of simplexity. The following code is fairly easy to implement.
expect Process.new.to_have(:finished) do |process|
process.finished = true
end
But, with expectations I chose to make the code below the legal syntax.
expect Process.new.to.have.finished do |process|
process.finished = true
endI chose the later version for a few reasons. The "to" method is the gateway method to all types of expectations. If you want a delegation expectation you write "expect Object.to.delegate...", if you want a behavior based expectation you write "expect Object.to.receive..", and if you want to use a boolean method defined on the class you can use "expect Object.to.be..." or "expect Object.to.have..." depending on what reads better. Using dots allows me to create a consistent interface for all the different expectations, and it also allows me to create expectations on any object without creating several different methods on the Object class.
I also chose to allow dot syntax because once you call "to.be" you can begin calling methods exactly as you would on the object itself. If the object is designed with fluency in mind the test can read as a dot delimited pseudo sentence. For example, the following test is an easy way to verify validation.
validates_presence_of :name
end
expect Person.new.to.have.errors.on(:name) do |person|
person.valid?
endTo get this desired behavior I rely on the use of method_missing. Using the method_missing method almost always increases complexity by an order of magnitude. However, in this case, if I didn't use method missing I'd need a solution similar to the one below.
expect Person.new.to.be.true.when(:errors).then(:on, [:name]) do |person|
person.valid?
endWhile this version is easier to implement, it's much less friendly to use.
The level of simplexity should probably be defined by the usage of the Domain Specific Language. If the language is only used by a small subset of employees on few occasions then it might not make sense to increase simplexity. Also, if the complexity of the implementation is raised to a level that it cannot effectively be maintained, you absolutely must back off some. However, if the complexity is maintainable and the DSL is used frequently, I would do my best to reduce "noise".
Conclusion
I believe that simplexity is inevitable when designing a Domain Specific Language, and it shouldn't be something that stops you from using one. However, it's always valuable to weigh how much complexity has been introduced by the desire to create a simple user experience. If the complexity is more painful than the benefit to the consumers, you've clearly gone too far. However, if the complexity is painless, but the DSL is so unpleasant that it's usage is limited, you might need to work on designing more friendly DSL, even at the expense of simplicity.
Labels: DSL
Tuesday, March 25, 2008
The Language Question
My first job was working with Cold Fusion for a year. For the next two years I worked primarily with Javascript and a proprietary language. Next was a brief stretch with PHP and ASP, followed by about 3 years of .net. Finally, for the past 2 years I've been doing Ruby. I do not consider myself a Ruby developer. I prefer to just be a developer.
I don't expect that Ruby will be the last language I ever work with. In fact, I expect I'll probably be doing something new in the near future. The question is what language should I work with next? Should I go for a classic like Lisp or Smalltalk? Should I give Scala or Erlang a shot and ride the concurrency wave?
The problem is, I'm asking the wrong question. Rarely when discussing languages do the languages themselves dominate the conversation. I can't remember ever being in a conversation where someone won a language debate with things like closures, metaprogramming or static typing. Instead, the primary focus of language debates are usually around frameworks, performance, IDE support, developer availability, what OS does it run on, and any number of other factors that are only related to the language.
The question extends even beyond factors related to the language.
Two years ago I started looking at Ruby because several of my colleagues were giving it a look. I preferred Rake to NAnt, and starting using it on my .net projects. Before long, someone asked me to be part of a Ruby project, because of my limited exposure to Rake.
I got introduced to Ruby by coworkers who were interested. I got involved with Ruby because I prefer build files that do not require XML programming. I stuck with Ruby because we had a steady stream of Ruby projects that needed experienced developers, I got plenty of blog content, I liked the composition of the Ruby teams I got to work with, and I liked working with clients who are comfortable with early adoption.
Notice, none of the reasons I use Ruby have anything to do with the Ruby language itself.
I'm interested in doing something new because I feel like I've been doing the same thing for about a year now. I'm also interested in traveling to other ThoughtWorks offices and getting some fresh ideas for innovation.
Again, none of my desires have anything to do with language.
I'm not giving you the tired "no silver bullet" or the hand waving "the right tool for the right job". I'm asserting that people use those phrases to justify their language choice, but you'd be better off asking what the real motivations for choosing a language are. What other factors does the language introduce that make it their choice.
It's also helpful to have this understanding when considering criticizing someone's language choice. My friends aren't using Java because they like Java, they are using it because they like IntelliJ, high performance on a few boxes, simple deployment, Hibernate, String Template, Spring, and a hundred other factors. Therefore, criticizing Java as a language doesn't really do anyone any good. Even if I convinced them that Lisp is a better language than Java, I still wouldn't have convinced them to use Lisp on their next project.
I don't expect that Ruby will be the last language I ever work with. In fact, I expect I'll probably be doing something new in the near future. The question is what language should I work with next? Should I go for a classic like Lisp or Smalltalk? Should I give Scala or Erlang a shot and ride the concurrency wave?
The problem is, I'm asking the wrong question. Rarely when discussing languages do the languages themselves dominate the conversation. I can't remember ever being in a conversation where someone won a language debate with things like closures, metaprogramming or static typing. Instead, the primary focus of language debates are usually around frameworks, performance, IDE support, developer availability, what OS does it run on, and any number of other factors that are only related to the language.
The question extends even beyond factors related to the language.
At Google I’ll work with C++ rather than (for example) Ruby but I do get to be part of changing the world. -- Jon TirsenJon gave up a job using his language of choice to work with another language he likes far less, but for a cause he's very interested in. I have another friend who recently starting working with an investment company because he was interested in the domain.
Two years ago I started looking at Ruby because several of my colleagues were giving it a look. I preferred Rake to NAnt, and starting using it on my .net projects. Before long, someone asked me to be part of a Ruby project, because of my limited exposure to Rake.
I got introduced to Ruby by coworkers who were interested. I got involved with Ruby because I prefer build files that do not require XML programming. I stuck with Ruby because we had a steady stream of Ruby projects that needed experienced developers, I got plenty of blog content, I liked the composition of the Ruby teams I got to work with, and I liked working with clients who are comfortable with early adoption.
Notice, none of the reasons I use Ruby have anything to do with the Ruby language itself.
I'm interested in doing something new because I feel like I've been doing the same thing for about a year now. I'm also interested in traveling to other ThoughtWorks offices and getting some fresh ideas for innovation.
Again, none of my desires have anything to do with language.
I'm not giving you the tired "no silver bullet" or the hand waving "the right tool for the right job". I'm asserting that people use those phrases to justify their language choice, but you'd be better off asking what the real motivations for choosing a language are. What other factors does the language introduce that make it their choice.
It's also helpful to have this understanding when considering criticizing someone's language choice. My friends aren't using Java because they like Java, they are using it because they like IntelliJ, high performance on a few boxes, simple deployment, Hibernate, String Template, Spring, and a hundred other factors. Therefore, criticizing Java as a language doesn't really do anyone any good. Even if I convinced them that Lisp is a better language than Java, I still wouldn't have convinced them to use Lisp on their next project.
Labels: languages
Monday, March 24, 2008
Example Dilemma
Creating examples for blog entries, presentations, articles, and books is very, very hard. You need an example that is easy enough that hopefully all, but at least most can understand it quickly. However, any example that trivial almost always has to make several conscious omissions.
For example, rarely do examples take into account error handling, performance, security, etc. Almost no example will include code that addresses cross cutting concerns. Nor should they, in my opinion. When the topic is error handling, the examples should include error handling. But, when the topic is how to design a DSL, it's much better to concentrate on communicating only that.
Cross cutting concerns aren't the only things ignored in the name of concise examples. Proper domain modeling, encapsulation, and many other best practices are also often ignored. I believe this is because many of those choices depend so heavily on context it would be impossible to devise an example that fits all possible scenarios. Instead of building the proper context, it's better to create an example that focuses on the problem and simply admit that you'll need to mold the example to your domain.
Of course, I'm not the first to write about these things. The following excerpt is from Martin Fowler, in chapter 1 of Refactoring.
I used to believe that it wasn't my problem if the reader missed the point. I was wrong. Martin and Michael's books are proof, this isn't a new problem. The entire reason I write is to provide ideas to my readers. I do believe it's my responsibility as an author to come up with the best possible example, which includes accounting for ways that someone might miss the point. If they miss the point, I've failed.
Of course, feedback comes in many forms. The last comment I got started with "I know this is off topic...". When a conversation is framed that way, the author is much less likely to become defensive of their example, and much more likely to happily address the off topic comment. I've often gotten similar comments related to examples that start with "I understand why the example addresses the problem you're writing about, but why not solve the problem using...". The people who leave comments this way seem to understand the Example Dilemma, and always seem to get better responses from authors.
Most people are curious and excited when they have what they believe to be a superior solution. This explains the constant flow of "better" examples and off topic questions. I think this is a good thing, otherwise I would turn comments off on my blog. But, I think understanding the Example Dilemma can turn an adversarial exchange into a beneficial one.
I don't believe the Example Dilemma is going anywhere, but I do believe we can all benefit from being more conscious of it.
For example, rarely do examples take into account error handling, performance, security, etc. Almost no example will include code that addresses cross cutting concerns. Nor should they, in my opinion. When the topic is error handling, the examples should include error handling. But, when the topic is how to design a DSL, it's much better to concentrate on communicating only that.
Cross cutting concerns aren't the only things ignored in the name of concise examples. Proper domain modeling, encapsulation, and many other best practices are also often ignored. I believe this is because many of those choices depend so heavily on context it would be impossible to devise an example that fits all possible scenarios. Instead of building the proper context, it's better to create an example that focuses on the problem and simply admit that you'll need to mold the example to your domain.
Of course, I'm not the first to write about these things. The following excerpt is from Martin Fowler, in chapter 1 of Refactoring.
With an introductory example, however, I run into a big problem. If I pick a large program, describing it and how it is refactored is too complicated for any reader to work through. (I tried and even a slightly complicated example runs to more than a hundred pages.) However, if I pick a program that is small enough to be comprehensible, refactoring does not look like it is worthwhile.Michael Feathers also has a section in the Preface of Working Effectively with Legacy Code where he describes his approach to coming up with examples.
Thus I'm in the classic bind of anyone who wants to describe techniques that are useful for real-world programs. Frankly it is not worth the effort to do the refactoring that I'm going to show you on a small program like the one I'm going to use. But if the code I'm showing you is part of a larger system, then the refactoring soon becomes important. So I have to ask you to look at this and imagine it in the context of a much larger system.
One thing that you will notice as you read this book is that it is not a book about pretty code. The examples that I use in the book are fabricated because I work under nondisclosure agreements with clients. But in many of the examples, I've tried to preserve the spirit of code that I've seen in the field. I won't say that the examples are always representative. There certainly are oases of great code out there, but, frankly, there are also pieces of code that are far worse than anything I can use as an example in this book. Aside from client confidentiality, I simply couldn't put code like that in this book without boring you to tears and burying important points in a morass of detail. As a result, many of the examples are relatively brief. If you look at one of them and think "No, he doesn't understand—my methods are much larger than that and much worse," please look at the advice that I am giving at face value and see if it applies, even if the example seems simpler.I see this most often with blog entries. Of course, this is because example code for blog entires isn't (usually) reviewed before it's published. Generally, a blog entry is published and a few comments begin to appear with a "better" way to solve the problem. Of course, the "better" way doesn't address the topic of the blog entry, so they aren't very relevant. The author (or another comment) responds with the traditional "You missed the point" and neither side really gains anything.
I used to believe that it wasn't my problem if the reader missed the point. I was wrong. Martin and Michael's books are proof, this isn't a new problem. The entire reason I write is to provide ideas to my readers. I do believe it's my responsibility as an author to come up with the best possible example, which includes accounting for ways that someone might miss the point. If they miss the point, I've failed.
Of course, feedback comes in many forms. The last comment I got started with "I know this is off topic...". When a conversation is framed that way, the author is much less likely to become defensive of their example, and much more likely to happily address the off topic comment. I've often gotten similar comments related to examples that start with "I understand why the example addresses the problem you're writing about, but why not solve the problem using...". The people who leave comments this way seem to understand the Example Dilemma, and always seem to get better responses from authors.
Most people are curious and excited when they have what they believe to be a superior solution. This explains the constant flow of "better" examples and off topic questions. I think this is a good thing, otherwise I would turn comments off on my blog. But, I think understanding the Example Dilemma can turn an adversarial exchange into a beneficial one.
I don't believe the Example Dilemma is going anywhere, but I do believe we can all benefit from being more conscious of it.
Labels: examples
Friday, March 21, 2008
Ruby: inject
I love inject. To be more specific, I love Enumerable#inject. I find it easy to read and easy to use. It's powerful and it lets me be more concise. Enumerable#inject is a good thing.
Of course, I didn't always love it. When I was new to Ruby I didn't understand what it was, so I found it hard to follow. However, finding it hard to understand didn't make me run from it, instead I wanted to know what all the hype was about. Enumerable#inject is an often used method by many great Rubyists, and I wanted to know what I was missing.
So, to learn about Enumerable#inject I did what I always do, I used it every possible way I could think of.
Example 1: Summing numbers
Summing numbers is the most common example for using inject. You have an array of numbers and you want the sum of those numbers.
If the example isn't straightforward, don't worry, we're going to break it down. The inject method takes an argument and a block. The block will be executed once for each element contained in the object that inject was called on ([1,2,3,4] in our example). The argument passed to inject will be yielded as the first argument to the block, the first time it's executed. The second argument yielded to the block will be the first element of the object that we called inject on.
So, the block will be executed 4 times, once for every element of our array ([1,2,3,4]). The first time the block executes the result argument will have a value of 0 (the value we passed as an argument to inject) and the element argument will have a value of 1 (the first element in our array).
You can do anything you want within the block, but the return value of the block is very important. The return value of the block will be yielded as the result argument the next time the block is executed.
In our example we add the result, 0, to the element, 1. Therefore, the return value of the block will be 0 + 1, or 1. This will result in 1 being yielded as the result argument the second time the block is executed.
The second time the block is executed the result of the previous block execution, 1, will be yielded as the result, and the second element of the array will be yielded as the element. Again the result, 1, and the element, 2 will be added together, resulting in the return value of the block being 3.
The third time the block is executed the result of the second block execution, 3, is yielded as the result argument and the third element of the array, 3, will be yielded as the element argument. Again, the result and the element will be added, and the return value of the block for the third execution will be 6.
The fourth time will be the final time the block is executed since there are only 4 elements in our array. The result value will be 6, the result from the third execution of the block, and the element will be 4, the fourth element of the array. The block will execute, adding four plus six, and the return value of the block will be 10. On the final execution of the block the return value is used as the return value of the inject method; therefore, as the example shows, the result of executing the code above is 10.
That's the very long version of how inject works, but you could actually shortcut one of the block executions by not passing an argument to inject.
As the example shows, the argument to inject is actually optional. If a default value is not passed in as an argument the first time the block executes the first argument (result from our example) will be set to the first element of the enumerable (1 from our example) and the second argument (element from our example) will be set to the second element of the enumerable (2 from our example).
In this case the block will only need to be executed 3 times, since the first execution will yield both the first and the second element. The first time the block executes it will add the result, 1, to the element, 2, and return a value of 3. The second time the block executes the result will be 3 and the element will also be 3. All additional steps will be the same, and the result will be 10 once again.
Summing numbers with inject is a simple example of taking an array of numbers and building a resulting sum one element at a time.
Example 2: Building a Hash
Sometimes you'll have data in one format, but you really want it in another. For example, you may have an array that contains keys and values as pairs, but it's really just an array of arrays. In that case, inject is a nice solution for quickly converting your array of arrays into a hash.
As the example shows, I start with an empty hash (the argument to inject) and I iterate through each element in the array adding the key and value one at a time to the result. Also, since the result of the block is the next yielded result, I need to add to the hash, but explicitly return the result on the following line.
Ola Bini and rubikitch both pointed out that you can also create a hash from an array with the following code.
Of course, I can do other things in inject also, such as converting the keys to be strings and changing the names to be lowercase.
This is a central value for inject, it allows me to easily convert an enumerable into an object that is useful for the problem I'm trying to solve.
Example 3: Building an Array
Enumerable gives you many methods you need for manipulating arrays. For example, if want all the integers of an array, that are even, as strings, you can do so chaining various methods from Enumerable.
Chaining methods of Enumerable is a solution that's very comfortable for many developers, but as the chain gets longer I prefer to use inject. The inject method allows me to handle everything I need without having to chain multiple independent methods.
The code below achieves the same thing in one method, and is just as readable, to me.
Of course, that example is a bit contrived; however, a realistic example is when you have an object with two different properties and you want to build an array of one, conditionally based on the other. A more concrete example is an array of test result objects that know if they've failed or succeeded and they have a failure message if they've failed. For reporting, you want all the failure messages.
You can get this with the built in methods of Enumerable.
But, it's not obvious what you are doing until you read the entire line. You could build the array the same way using inject and if you are comfortable with inject it reads slightly cleaner.
I prefer to build what I want using inject instead of chaining methods of Enumerable and effectively building multiple objects on the way to what I need.
Example 4: Building a Hash (again)
Building from the Test Result example you might want to group all results by their status. The inject method lets you easily do this by starting with an empty hash and defaulting each key value to an empty array, which is then appended to with each element that has the same status.
You might be sensing a theme here.
Example 5: Building a unknown result
Usually you know what kind of result you are looking for when you use inject, but it's also possible to use inject to build an unknown object.
Consider the following Recorder class that saves any messages you send it.
By simply defining the play_for method and using inject you can replay each message on the argument and get back anything depending on how the argument responds to the recorded methods.
Conclusion
How do I know when I want to use inject? I like to use inject anytime I am building an object a piece at a time. In the case of summing, creating a hash, or an array I'm building a result by applying changes based on the elements of the enumerable. After I'm done applying changes for each element, I have the finished object I'm looking for. The same is true of the Recorder example, I'm sending the methods one at a time until I return the result of sending all the methods.
So next time you need to build an object based on elements of an enumerable, consider using inject.
Of course, I didn't always love it. When I was new to Ruby I didn't understand what it was, so I found it hard to follow. However, finding it hard to understand didn't make me run from it, instead I wanted to know what all the hype was about. Enumerable#inject is an often used method by many great Rubyists, and I wanted to know what I was missing.
So, to learn about Enumerable#inject I did what I always do, I used it every possible way I could think of.
Example 1: Summing numbers
Summing numbers is the most common example for using inject. You have an array of numbers and you want the sum of those numbers.
[1, 2, 3, 4].inject(0) {|result, element| result + element } # => 10If the example isn't straightforward, don't worry, we're going to break it down. The inject method takes an argument and a block. The block will be executed once for each element contained in the object that inject was called on ([1,2,3,4] in our example). The argument passed to inject will be yielded as the first argument to the block, the first time it's executed. The second argument yielded to the block will be the first element of the object that we called inject on.
So, the block will be executed 4 times, once for every element of our array ([1,2,3,4]). The first time the block executes the result argument will have a value of 0 (the value we passed as an argument to inject) and the element argument will have a value of 1 (the first element in our array).
You can do anything you want within the block, but the return value of the block is very important. The return value of the block will be yielded as the result argument the next time the block is executed.
In our example we add the result, 0, to the element, 1. Therefore, the return value of the block will be 0 + 1, or 1. This will result in 1 being yielded as the result argument the second time the block is executed.
The second time the block is executed the result of the previous block execution, 1, will be yielded as the result, and the second element of the array will be yielded as the element. Again the result, 1, and the element, 2 will be added together, resulting in the return value of the block being 3.
The third time the block is executed the result of the second block execution, 3, is yielded as the result argument and the third element of the array, 3, will be yielded as the element argument. Again, the result and the element will be added, and the return value of the block for the third execution will be 6.
The fourth time will be the final time the block is executed since there are only 4 elements in our array. The result value will be 6, the result from the third execution of the block, and the element will be 4, the fourth element of the array. The block will execute, adding four plus six, and the return value of the block will be 10. On the final execution of the block the return value is used as the return value of the inject method; therefore, as the example shows, the result of executing the code above is 10.
That's the very long version of how inject works, but you could actually shortcut one of the block executions by not passing an argument to inject.
[1, 2, 3, 4].inject {|result, element| result + element } # => 10
As the example shows, the argument to inject is actually optional. If a default value is not passed in as an argument the first time the block executes the first argument (result from our example) will be set to the first element of the enumerable (1 from our example) and the second argument (element from our example) will be set to the second element of the enumerable (2 from our example).
In this case the block will only need to be executed 3 times, since the first execution will yield both the first and the second element. The first time the block executes it will add the result, 1, to the element, 2, and return a value of 3. The second time the block executes the result will be 3 and the element will also be 3. All additional steps will be the same, and the result will be 10 once again.
Summing numbers with inject is a simple example of taking an array of numbers and building a resulting sum one element at a time.
Example 2: Building a Hash
Sometimes you'll have data in one format, but you really want it in another. For example, you may have an array that contains keys and values as pairs, but it's really just an array of arrays. In that case, inject is a nice solution for quickly converting your array of arrays into a hash.
hash = [[:first_name, 'Shane'], [:last_name, 'Harvie']].inject({}) do |result, element|
result[element.first] = element.last
result
end
hash # => {:first_name=>"Shane", :last_name=>"Harvie"}
As the example shows, I start with an empty hash (the argument to inject) and I iterate through each element in the array adding the key and value one at a time to the result. Also, since the result of the block is the next yielded result, I need to add to the hash, but explicitly return the result on the following line.
Ola Bini and rubikitch both pointed out that you can also create a hash from an array with the following code.
Hash[*[[:first_name, 'Shane'], [:last_name, 'Harvie']].flatten] # => {:first_name=>"Shane", :last_name=>"Harvie"}Of course, I can do other things in inject also, such as converting the keys to be strings and changing the names to be lowercase.
hash = [[:first_name, 'Shane'], [:last_name, 'Harvie']].inject({}) do |result, element|
result[element.first.to_s] = element.last.downcase
result
end
hash # => {"first_name"=>"shane", "last_name"=>"harvie"}This is a central value for inject, it allows me to easily convert an enumerable into an object that is useful for the problem I'm trying to solve.
Example 3: Building an Array
Enumerable gives you many methods you need for manipulating arrays. For example, if want all the integers of an array, that are even, as strings, you can do so chaining various methods from Enumerable.
[1, 2, 3, 4, 5, 6].select {|element| element % 2 == 0 }.collect {|element| element.to_s } # => ["2", "4", "6"]Chaining methods of Enumerable is a solution that's very comfortable for many developers, but as the chain gets longer I prefer to use inject. The inject method allows me to handle everything I need without having to chain multiple independent methods.
The code below achieves the same thing in one method, and is just as readable, to me.
array = [1, 2, 3, 4, 5, 6].inject([]) do |result, element|
result << element.to_s if element % 2 == 0
result
end
array # => ["2", "4", "6"]Of course, that example is a bit contrived; however, a realistic example is when you have an object with two different properties and you want to build an array of one, conditionally based on the other. A more concrete example is an array of test result objects that know if they've failed or succeeded and they have a failure message if they've failed. For reporting, you want all the failure messages.
You can get this with the built in methods of Enumerable.
TestResult = Struct.new(:status, :message)
results = [
TestResult.new(:failed, "1 expected but was 2"),
TestResult.new(:sucess),
TestResult.new(:failed, "10 expected but was 20")
]
messages = results.select {|test_result| test_result.status == :failed }.collect {|test_result| test_result.message }
messages # => ["1 expected but was 2", "10 expected but was 20"]
But, it's not obvious what you are doing until you read the entire line. You could build the array the same way using inject and if you are comfortable with inject it reads slightly cleaner.
TestResult = Struct.new(:status, :message)
results = [
TestResult.new(:failed, "1 expected but was 2"),
TestResult.new(:sucess),
TestResult.new(:failed, "10 expected but was 20")
]
messages = results.inject([]) do |messages, test_result|
messages << test_result.message if test_result.status == :failed
messages
end
messages # => ["1 expected but was 2", "10 expected but was 20"]
I prefer to build what I want using inject instead of chaining methods of Enumerable and effectively building multiple objects on the way to what I need.
Example 4: Building a Hash (again)
Building from the Test Result example you might want to group all results by their status. The inject method lets you easily do this by starting with an empty hash and defaulting each key value to an empty array, which is then appended to with each element that has the same status.
TestResult = Struct.new(:status, :message)
results = [
TestResult.new(:failed, "1 expected but was 2"),
TestResult.new(:sucess),
TestResult.new(:failed, "10 expected but was 20")
]
grouped_results = results.inject({}) do |grouped, test_result|
grouped[test_result.status] = [] if grouped[test_result.status].nil?
grouped[test_result.status] << test_result
grouped
end
grouped_results
# >> {:failed => [
# >> #<struct TestResult status=:failed, message="1 expected but was 2">,
# >> #<struct TestResult status=:failed, message="10 expected but was 20">],
# >> :sucess => [ #<struct TestResult status=:sucess, message=nil> ]
# >> }You might be sensing a theme here.
Example 5: Building a unknown result
Usually you know what kind of result you are looking for when you use inject, but it's also possible to use inject to build an unknown object.
Consider the following Recorder class that saves any messages you send it.
instance_methods.each do |meth|
undef_method meth unless meth =~ /^(__|inspect|to_str)/
end
messages << [sym, args]
self
end
@messages ||= []
end
end
Recorder.new.will.record.anything.you.want
# >> #<Recorder:0x28ed8 @messages=[[:will, []], [:record, []], [:anything, []], [:you, []], [:want, []]]>
By simply defining the play_for method and using inject you can replay each message on the argument and get back anything depending on how the argument responds to the recorded methods.
messages.inject(obj) do |result, message|
result.send message.first, *message.last
end
end
end
recorder = Recorder.new
recorder.methods.sort
recorder.play_for(String)
# >> ["<", "<=", "<=>", "==", "===", "=~", ">", ">=", "__id__", ...]Conclusion
How do I know when I want to use inject? I like to use inject anytime I am building an object a piece at a time. In the case of summing, creating a hash, or an array I'm building a result by applying changes based on the elements of the enumerable. After I'm done applying changes for each element, I have the finished object I'm looking for. The same is true of the Recorder example, I'm sending the methods one at a time until I return the result of sending all the methods.
So next time you need to build an object based on elements of an enumerable, consider using inject.
Labels: enumerable, inject, ruby
Thursday, March 20, 2008
Testing Anti-Pattern: Metaprogrammed Tests
Update at bottom
Update 2 for Saikuro reported cyclomatic complexity
Update 3 for Flog
I despise metaprogrammed tests. The problem with metaprogrammed tests is that they introduce more questions than answers. Tests are supposed to give confidence, but I don't feel very confident when I find myself asking: which assertion failed? what part of the test is wrong? in which loop, at what value, do you think the problem is?
Let's jump straight to an example. The following method on Fixnum will tell you what the letter grade is.
For completeness you may wish to test every value between 0 and 100 to ensure that no mistakes are made. Doing this the most straight forward way possible, you would define 101 tests and test every value individually.
While this would work it suffers from a few complications: it's too long to digest and it would be painfully tedious to write. You might jump to the conclusion that you ought to metaprogram the tests to resolve the previously mentioned issues.
This solution isn't so bad at first glance. When a test fails, I can see what number I was working with, what letter I expected and what letter I actually got.
Also, if you find yourself wanting to defend metaprogrammed tests, ask yourself if you usually even provide as many clues as I have. Do you create test names that help you figure out what the problem was? Do you first get the letter and then compare it, or do you assert true and false, yielding even less information. If you don't give me at least as much information as I've given myself in my example, I can't even begin to imagine trying to find out what's wrong with a broken test.
The single largest problem with metaprogrammed tests is that they've unnecessarily added complexity to your test suite. This complexity reduces the maintainability of tests, ensuring that they are less likely to be maintained.
There is a better way.
You can approach the problem differently and still provide a concise solution. Looking at our issue another way, we simply want to test that certain values return A, B, C, D, or F. To me, that appears like I need 5 different tests, not 101. Here's what I consider to be a more maintainable solution.
The above tests should be readable to anyone very quickly. They correctly provide the line number of a failing test when a test fails. Also, each test verifies only one piece of logic, greatly reducing complexity. Lastly, I can easily see in the test that it's written correctly, so any errors must be resulting from a mistake in the domain.
These tests instill more confidence and they are easier to digest and therefore maintain. These are tests that are more likely to live on and provide value. These are tests I thank my teammates for.
Update
Tammer Saleh correctly points out that the failure message for my last example would actually be worse than the failure message from the metaprogrammed tests. I was aware of that fact when I wrote up the entry, but I was unsure how to address the issue. If I were on a project I would write a custom assertion for expectations that would give me a descriptive error message while also allowing me to easily test what I want. That custom assertion would be well tested and could be designed to be general enough to apply across my entire test suite, thus infinitely more valuable than metaprogramming that only solves a problem for a specific test.
But, this isn't a project, it's an example. Still, I failed, I didn't give the complete answer. This is my attempt to resolve that situation. As I said, on a project I would use expectations, but for the purpose of this entry, I'll provide a custom assertion that could be easily used with test/unit.
The general solution is that I have an enumerable object and I want to verify the result of calling a method on each element of the enumerable. Thus, I should be able to create a general assertion that takes my expected single result, the enumerable, and the block that should be executed on each element. If all elements return actual results that match the expected value then the test passes. However, if any element does not return the expected value, then the expected value, the actual value, and the element are all described in the error message. The error message will contain all failures, not just the first one that fails.
Below is the code in full, but the following code would not be enough if this were a real project. Instead, if this were a real project this custom assertion should be tested with the same amount of effort that you put into testing any domain concept.
Additionally, here's the results from a failing test.
I would take this solution over any metaprogrammed solution I can think of.
Update 2
I decided to check out what the cyclomatic complexity would look like for defining tests in a loop compared to traditional definitions with custom assertions. I used Saikuro to give me cyclomatic complexity results.
Interestingly, the complexity of the looping test definition (8) is more than the complexity of the logic added to Fixnum (6). It's also double the complexity of the custom assertion version (4) of the tests. The custom assertion also registers a score of 4, but that doesn't concern me since I'll test the custom assertion.
For those interested in running the experiment the code I used can be found below. I defined a class method and called it explicitly because Saikuro reports complexity on a method basis, so I needed a method for it measure.
Update 3
Since I ran Saikuro on the code, it only made sense to put it through Flog also.
The following code was flogged.
The flog score of the looping version was 15.3, the score of the custom assertion version was 6.5.
Both Saikuro and Flog marked the looping test definition with warnings and as a potential problem.
Update 2 for Saikuro reported cyclomatic complexity
Update 3 for Flog
I despise metaprogrammed tests. The problem with metaprogrammed tests is that they introduce more questions than answers. Tests are supposed to give confidence, but I don't feel very confident when I find myself asking: which assertion failed? what part of the test is wrong? in which loop, at what value, do you think the problem is?
Let's jump straight to an example. The following method on Fixnum will tell you what the letter grade is.
case self
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
end
end
50.as_letter_grade # => "F"
60.as_letter_grade # => "D"
70.as_letter_grade # => "C"
80.as_letter_grade # => "B"
90.as_letter_grade # => "A"
For completeness you may wish to test every value between 0 and 100 to ensure that no mistakes are made. Doing this the most straight forward way possible, you would define 101 tests and test every value individually.
assert_equal "F", 0.as_letter_grade
end
assert_equal "F", 1.as_letter_grade
end
assert_equal "F", 2.as_letter_grade
end
endWhile this would work it suffers from a few complications: it's too long to digest and it would be painfully tedious to write. You might jump to the conclusion that you ought to metaprogram the tests to resolve the previously mentioned issues.
(0..100).each do |index|
letter = case index
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
define_method "test__is_" do
assert_equal letter, index.as_letter_grade
end
end
endThis solution isn't so bad at first glance. When a test fails, I can see what number I was working with, what letter I expected and what letter I actually got.
Then I have to actually figure out what is wrong, and this is where I begin to really dislike metaprogrammed tests. The line number is almost worthless. Yes, the loop is on or near that line, but the actual failure isn't found exclusively on that line, it also contains about 100 successful assertions. Also, I always expect the problem to be in the class, but that's not always the case. Metaprogramming in tests is just as susceptable to mistakes as programming the domain. Yet, by instinct we always look there last, because we expect our tests to give us confidence, they should be correct. The example code is easy enough to follow, but most metaprogrammed tests contain more complexity, thus leading to even more fragile and fear instilling tests.
Loaded suite /Users/jay/Desktop/foo
Started
..........................................................
..........F................................
Finished in 0.024512 seconds.
1) Failure:
test_70_is_C:32
<"C"> expected but was
<"D">.
101 tests, 101 assertions, 1 failures, 0 errors
Also, if you find yourself wanting to defend metaprogrammed tests, ask yourself if you usually even provide as many clues as I have. Do you create test names that help you figure out what the problem was? Do you first get the letter and then compare it, or do you assert true and false, yielding even less information. If you don't give me at least as much information as I've given myself in my example, I can't even begin to imagine trying to find out what's wrong with a broken test.
The single largest problem with metaprogrammed tests is that they've unnecessarily added complexity to your test suite. This complexity reduces the maintainability of tests, ensuring that they are less likely to be maintained.
There is a better way.
You can approach the problem differently and still provide a concise solution. Looking at our issue another way, we simply want to test that certain values return A, B, C, D, or F. To me, that appears like I need 5 different tests, not 101. Here's what I consider to be a more maintainable solution.
assert_equal ["A"], (90..100).collect {|int| int.as_letter_grade }.uniq
end
assert_equal ["B"], (80..89).collect {|int| int.as_letter_grade }.uniq
end
# ... test the other letters
endThe above tests should be readable to anyone very quickly. They correctly provide the line number of a failing test when a test fails. Also, each test verifies only one piece of logic, greatly reducing complexity. Lastly, I can easily see in the test that it's written correctly, so any errors must be resulting from a mistake in the domain.
These tests instill more confidence and they are easier to digest and therefore maintain. These are tests that are more likely to live on and provide value. These are tests I thank my teammates for.
Update
Tammer Saleh correctly points out that the failure message for my last example would actually be worse than the failure message from the metaprogrammed tests. I was aware of that fact when I wrote up the entry, but I was unsure how to address the issue. If I were on a project I would write a custom assertion for expectations that would give me a descriptive error message while also allowing me to easily test what I want. That custom assertion would be well tested and could be designed to be general enough to apply across my entire test suite, thus infinitely more valuable than metaprogramming that only solves a problem for a specific test.
But, this isn't a project, it's an example. Still, I failed, I didn't give the complete answer. This is my attempt to resolve that situation. As I said, on a project I would use expectations, but for the purpose of this entry, I'll provide a custom assertion that could be easily used with test/unit.
The general solution is that I have an enumerable object and I want to verify the result of calling a method on each element of the enumerable. Thus, I should be able to create a general assertion that takes my expected single result, the enumerable, and the block that should be executed on each element. If all elements return actual results that match the expected value then the test passes. However, if any element does not return the expected value, then the expected value, the actual value, and the element are all described in the error message. The error message will contain all failures, not just the first one that fails.
Below is the code in full, but the following code would not be enough if this were a real project. Instead, if this were a real project this custom assertion should be tested with the same amount of effort that you put into testing any domain concept.
case self
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
end
end
assert_enumerable_only_returns("A", 90..100) {|int| int.as_letter_grade }
end
assert_enumerable_only_returns("B", 80..89) {|int| int.as_letter_grade }
end
# ... test the other letters
end
messages = enumerable.inject([]) do |result, element|
actual = element.instance_eval(&block)
result << "<> expected but was <> for " if expected != actual
result
end
assert_block(messages.join("\n")) {messages.empty? }
end
endAdditionally, here's the results from a failing test.
assert_enumerable_only_returns("B", 78..89) {|int| int.as_letter_grade }
end
end
# >> Loaded suite -
# >> Started
# >> F
# >> Finished in 0.00063 seconds.
# >>
# >> 1) Failure:
# >> test_numbers_that_are_Bs(GradeTests)
# >> [-:28:in `assert_enumerable_only_returns'
# >> -:17:in `test_numbers_that_are_Bs']:
# >> <B> expected but was <C> for 78
# >> <B> expected but was <C> for 79
# >>
# >> 1 tests, 1 assertions, 1 failures, 0 errorsI would take this solution over any metaprogrammed solution I can think of.
Update 2
I decided to check out what the cyclomatic complexity would look like for defining tests in a loop compared to traditional definitions with custom assertions. I used Saikuro to give me cyclomatic complexity results.
Interestingly, the complexity of the looping test definition (8) is more than the complexity of the logic added to Fixnum (6). It's also double the complexity of the custom assertion version (4) of the tests. The custom assertion also registers a score of 4, but that doesn't concern me since I'll test the custom assertion.
For those interested in running the experiment the code I used can be found below. I defined a class method and called it explicitly because Saikuro reports complexity on a method basis, so I needed a method for it measure.
case self
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
end
end
(0..100).each do |index|
letter = case index
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
define_method "test__is_" do
assert_equal letter, index.as_letter_grade
end
end
end
define_tests
end
assert_enumerable_only_returns("A", 90..100) {|int| int.as_letter_grade }
end
assert_enumerable_only_returns("B", 80..89) {|int| int.as_letter_grade }
end
end
messages = enumerable.inject([]) do |result, element|
actual = element.instance_eval(&block)
result << "<> expected but was <> for " if expected != actual
result
end
assert_block(messages.join("\n")) {messages.empty? }
end
endUpdate 3
Since I ran Saikuro on the code, it only made sense to put it through Flog also.
The following code was flogged.
(0..100).each do |index|
letter = case index
when 0..59 then "F"
when 60..69 then "D"
when 70..79 then "C"
when 80..89 then "B"
when 90..100 then "A"
end
define_method "test__is_" do
assert_equal letter, index.as_letter_grade
end
end
end
define_tests
end
assert_enumerable_only_returns("A", 90..100) {|int| int.as_letter_grade }
end
assert_enumerable_only_returns("B", 80..89) {|int| int.as_letter_grade }
end
endThe flog score of the looping version was 15.3, the score of the custom assertion version was 6.5.
Both Saikuro and Flog marked the looping test definition with warnings and as a potential problem.
Labels: metaprogramming, testing
Wednesday, March 19, 2008
Ruby: Replace Temp with Chain
You have methods that can be chained for greater maintainability.
becomes
Motivation
Calling methods on different lines technically gets the job done, but at times it makes sense to chain method calls together and provide a more fluent interface. In the above examples, assigning an expectation to a local variable is only necessary so that the arguments and return value can be specified. The solution utilizing Method Chaining removes the need for the local variable. Method Chaining can also improve maintainability by providing an interface that allows you to compose code that reads naturally.
Mechanics
Suppose you were designing a library for creating html elements. This library would likely contain a method that created a select drop down and allowed you to add options to the select. The following code contains the Select class that could enable creating the example html and an example usage of the select class.
The first step in creating a Method Chained solution is to create a method that creates the Select instance and adds an option.
Next, change the method that adds options to return self so that it can be chained.
Finally, rename the add_option method to something that reads more fluently, such as "and".
mock = Mock.new
expectation = mock.expects(:a_method_name)
expectation.with("arguments")
expectation.returns([1, :array])becomes
mock = Mock.new
mock.expects(:a_method_name).with("arguments").returns([1, :array])Motivation
Calling methods on different lines technically gets the job done, but at times it makes sense to chain method calls together and provide a more fluent interface. In the above examples, assigning an expectation to a local variable is only necessary so that the arguments and return value can be specified. The solution utilizing Method Chaining removes the need for the local variable. Method Chaining can also improve maintainability by providing an interface that allows you to compose code that reads naturally.
Mechanics
- Return self from methods you wish to allow chaining from
- Test
- Remove the local variable and chain the method calls
- Test
Suppose you were designing a library for creating html elements. This library would likely contain a method that created a select drop down and allowed you to add options to the select. The following code contains the Select class that could enable creating the example html and an example usage of the select class.
@options ||= []
end
options << arg
end
end
select = Select.new
select.add_option(1999)
select.add_option(2000)
select.add_option(2001)
select.add_option(2002)
select # => #<Select:0x28708 @options=[1999, 2000, 2001, 2002]>
The first step in creating a Method Chained solution is to create a method that creates the Select instance and adds an option.
select = self.new
select.options << option
select
end
# ...
end
select = Select.with_option(1999)
select.add_option(2000)
select.add_option(2001)
select.add_option(2002)
select # => #<Select:0x28488 @options=[1999, 2000, 2001, 2002]>
Next, change the method that adds options to return self so that it can be chained.
# ...
options << arg
self
end
end
select = Select.with_option(1999).add_option(2000).add_option(2001).add_option(2002)
select # => #<Select:0x28578 @options=[1999, 2000, 2001, 2002]>
Finally, rename the add_option method to something that reads more fluently, such as "and".
select = self.new
select.options << option
select
end
@options ||= []
end
options << arg
self
end
end
select = Select.with_option(1999).and(2000).and(2001).and(2002)
select # => #<Select:0x28578 @options=[1999, 2000, 2001, 2002]>Labels: refactoring, ruby
Tuesday, March 18, 2008
Ruby: Isolate Dynamic Receptor
Isolate Dynamic Receptor
A class utilizing method_missing has become painful to alter
Introduce a new class and move the method_missing logic to that class.
Motivation
As I previously mentioned, objects that use method_missing often raise NoMethodError errors unexpectedly, or worse you get no more information than: stack level too deep (SystemStackError).
Despite the added complexity, method_missing is a powerful tool that needs to be used when the interface of a class can not be predetermined. On those occasions I like to use Isolate Dynamic Receptor to limit the behavior of an object that also relies on method_missing.
The ActiveRecord::Base (AR::B) class defines method_missing to handle dynamic find messages. The implementation of method_missing allows you to send find messages that use attributes of a class as limiting conditions for the results that will be returned by the dynamic find messages. For example, given a Person subclass of AR::B that has both a first name and a ssn attribute it's possible to send the messages Person.find_by_first_name, Person.find_by_ssn, and Person.find_by_first_name_and_ssn.
It's possible, but not realistic to dynamically define methods for all possible combinations of the attributes of an AR::B subclass. Instead utilizing method_missing is a good solution; however, by defining method_missing on the AR::B class itself the complexity of the class is increased significantly. AR::B would benefit from a maintainability perspective if instead the dynamic finder logic were defined on a class whose single responsibility was to handle dynamic find messages. For example, the above Person class could support find with the following syntax: Person.find.by_first_name, Person.find.by_ssn, or Person.find.by_first_name_and_ssn
Note: very often it's possible to know all valid method calls ahead of time, in which case I prefer Replace Dynamic Receptor with Dynamically Define Method.
Mechanics
Here's a recorder class that records all calls to method_missing.
The recorder class may need additional behavior such as the ability to play back all the messages on an object and the ability to represent all the calls as strings.
As the behavior of Recorder grows it becomes harder to understand what messages are dynamically handled and what messages are actually explicitly defined. By design the functionality of method_missing should handle any unknown message, but how do you know if you've broken something by adding a explicitly defined method?
The solution to this problem is to introduce an additional class that has the single responsibility of handling the dynamic method calls. In this case we have a class Recorder that handles recording unknown messages as well as playing back the messages or printing them. To reduce complexity we will introduce the MessageCollector class that handles the method_missing calls.
The record method of Recorder will create a new instance of the MessageCollector class and each additional chained call will be recorded. The play back and printing capabilities will remain on the Recorder object.
A class utilizing method_missing has become painful to alter
Introduce a new class and move the method_missing logic to that class.
Motivation
As I previously mentioned, objects that use method_missing often raise NoMethodError errors unexpectedly, or worse you get no more information than: stack level too deep (SystemStackError).
Despite the added complexity, method_missing is a powerful tool that needs to be used when the interface of a class can not be predetermined. On those occasions I like to use Isolate Dynamic Receptor to limit the behavior of an object that also relies on method_missing.
The ActiveRecord::Base (AR::B) class defines method_missing to handle dynamic find messages. The implementation of method_missing allows you to send find messages that use attributes of a class as limiting conditions for the results that will be returned by the dynamic find messages. For example, given a Person subclass of AR::B that has both a first name and a ssn attribute it's possible to send the messages Person.find_by_first_name, Person.find_by_ssn, and Person.find_by_first_name_and_ssn.
It's possible, but not realistic to dynamically define methods for all possible combinations of the attributes of an AR::B subclass. Instead utilizing method_missing is a good solution; however, by defining method_missing on the AR::B class itself the complexity of the class is increased significantly. AR::B would benefit from a maintainability perspective if instead the dynamic finder logic were defined on a class whose single responsibility was to handle dynamic find messages. For example, the above Person class could support find with the following syntax: Person.find.by_first_name, Person.find.by_ssn, or Person.find.by_first_name_and_ssn
Note: very often it's possible to know all valid method calls ahead of time, in which case I prefer Replace Dynamic Receptor with Dynamically Define Method.
Mechanics
- Create a new class whose sole responsibility is to handle the dynamic method calls.
- Copy the logic from method_missing on the original class to the method_missing of the focused class.
- Change all client code that previously called the dynamic methods on the original object.
- Remove the method_missing from the original object.
- Test
Here's a recorder class that records all calls to method_missing.
instance_methods.each do |meth|
undef_method meth unless meth =~ /^(__|inspect)/
end
@messages ||= []
end
messages << [sym, args]
self
end
endThe recorder class may need additional behavior such as the ability to play back all the messages on an object and the ability to represent all the calls as strings.
messages.inject(obj) do |result, message|
result.send message.first, *message.last
end
end
messages.inject([]) do |result, message|
result << "(args: )"
end.join(".")
end
endAs the behavior of Recorder grows it becomes harder to understand what messages are dynamically handled and what messages are actually explicitly defined. By design the functionality of method_missing should handle any unknown message, but how do you know if you've broken something by adding a explicitly defined method?
The solution to this problem is to introduce an additional class that has the single responsibility of handling the dynamic method calls. In this case we have a class Recorder that handles recording unknown messages as well as playing back the messages or printing them. To reduce complexity we will introduce the MessageCollector class that handles the method_missing calls.
instance_methods.each do |meth|
undef_method meth unless meth =~ /^(__|inspect)/
end
@messages ||= []
end
messages << [sym, args]
self
end
endThe record method of Recorder will create a new instance of the MessageCollector class and each additional chained call will be recorded. The play back and printing capabilities will remain on the Recorder object.
@message_collector.messages.inject(obj) do |result, message|
result.send message.first, *message.last
end
end
@message_collector ||= MessageCollector.new
end
@message_collector.messages.inject([]) do |result, message|
result << "(args: )"
end.join(".")
end
endLabels: method_missing, refactoring, ruby
Monday, March 17, 2008
Move eval from Run-time to Parse-time
You need to use eval, but want to limit the number of times eval is necessary.
becomes
Motivation
Mechanics
The following Person class uses eval to define the logic the readers rely upon for returning a default value if no value has previously been set.
The above example executes without issue, but it relies upon eval each time a reader is called. If multiple calls to eval are determined to be problematic the solution is to expand the eval to include defining the method itself.
options.each_pair do |attribute, default_value|
define_method attribute do
eval "@ ||= "
end
end
end
attr_with_default :emails => "[]", :employee_number => "EmployeeNumberGenerator.next"
endbecomes
options.each_pair do |attribute, default_value|
eval "def
@ ||=
end"
end
end
attr_with_default :emails => "[]", :employee_number => "EmployeeNumberGenerator.next"
endMotivation
premature optimization is the root of all evil -- Knuth, DonaldI'll never advocate for premature optimization, but this refactoring can be helpful when you determine that eval is a source of performance pain. The Kernel#eval method can be the right solution in some cases; but it is almost always more expensive (in terms of performance) than it's alternatives. In the cases where eval is necessary, it's often better to move an eval call from run-time to parse-time.
Mechanics
- Expand the scope of the string being eval'd.
- Test
The following Person class uses eval to define the logic the readers rely upon for returning a default value if no value has previously been set.
options.each_pair do |attribute, default_value|
define_method attribute do
eval "@ ||= "
end
end
end
attr_with_default :emails => "[]", :employee_number => "EmployeeNumberGenerator.next"
endThe above example executes without issue, but it relies upon eval each time a reader is called. If multiple calls to eval are determined to be problematic the solution is to expand the eval to include defining the method itself.
options.each_pair do |attribute, default_value|
eval "def
@ ||=
end"
end
end
attr_with_default :emails => "[]", :employee_number => "EmployeeNumberGenerator.next"
endLabels: eval, refactoring, ruby
Friday, March 07, 2008
Ruby: expectations gem version 0.2.3
I'm very opinionated about unit testing. I believe that you should only have one assertion per test which includes having only one expectation per test. I also believe that tests should be as clear as possible, and doing things such as inlining setup and expecting literals will result in more maintainable tests.
I've been an Xunit fan for years. I can do everything I want using Xunit frameworks, but at times the framework gets in my way. For example, I generally don't need a test name, which is really nothing more than a glorified comment. I also dislike having different syntaxes for state based and behavior based tests. Most importantly it enables other people to make questionable decisions.
On Christmas 2007 I released my opinionated unit testing framework: expectations.
Expectations has one syntax for both state based and behavior based tests. Expectations has no test name, if you need a comment, you add a real comment, if you don't you aren't required to. Expected values are specified outside the scope of the test, which encourages you to use literals whenever possible. Expectations has no support for setup or teardown, it forces you to duplicate code or write more loosely coupled code that requires less setup -- which is a good thing.
Some tests aren't a good fit for expectations, that's where a functional test suite comes into play. I've used both test/unit and RSpec for functional testing, both have benefits and provide you the ability to break every suggestion expectations provides.
I've been using expectations on my current project and all my open source projects for about 3 months. The framework has grown in many ways in those 3 months, the rest of this post is about the features that are currently supported.
Expectations supports traditional state based tests. The result of executing the block is compared with the expected value and if they are equal the test passes. The resulting tests are very easy to follow. The expected value is obvious in the first line, all but the last line in the block are setting up the test and the last line is the actual value.
Expectations also supports behavior based tests by setting expectations on objects. The object can be a concrete class, a stub or a mock, it doesn't matter, they all support behavior based expectations (with the same syntax).
Expectations also supports asserting an exception will be thrown.
Expectations generally uses == internally to test equality; however, expectations also supports case equality (===) for Ranges, Regexps, and Modules.
Every expectation that uses case equality also uses regular equality, so the following tests also pass.
Expectations also introduces a fluent interface for asserting boolean values. The following tests verify that the attribute is set to true or false, based on the expectation definition.
The above example also shows expectations that exist without using a block. These are fully functional and can be run individually within TextMate (snippet for running individual expectations is in the README).
If you have opinions like mine, you may want to give expectations a shot. If you disagree, that's cool too, to each their own.
I've been an Xunit fan for years. I can do everything I want using Xunit frameworks, but at times the framework gets in my way. For example, I generally don't need a test name, which is really nothing more than a glorified comment. I also dislike having different syntaxes for state based and behavior based tests. Most importantly it enables other people to make questionable decisions.
On Christmas 2007 I released my opinionated unit testing framework: expectations.
Expectations has one syntax for both state based and behavior based tests. Expectations has no test name, if you need a comment, you add a real comment, if you don't you aren't required to. Expected values are specified outside the scope of the test, which encourages you to use literals whenever possible. Expectations has no support for setup or teardown, it forces you to duplicate code or write more loosely coupled code that requires less setup -- which is a good thing.
Some tests aren't a good fit for expectations, that's where a functional test suite comes into play. I've used both test/unit and RSpec for functional testing, both have benefits and provide you the ability to break every suggestion expectations provides.
I've been using expectations on my current project and all my open source projects for about 3 months. The framework has grown in many ways in those 3 months, the rest of this post is about the features that are currently supported.
Expectations supports traditional state based tests. The result of executing the block is compared with the expected value and if they are equal the test passes. The resulting tests are very easy to follow. The expected value is obvious in the first line, all but the last line in the block are setting up the test and the last line is the actual value.
expect 2 do
1 + 1
endExpectations also supports behavior based tests by setting expectations on objects. The object can be a concrete class, a stub or a mock, it doesn't matter, they all support behavior based expectations (with the same syntax).
# Behavior based test using a traditional mock
expect mock.to.receive(:dial).with("2125551212").times(2) do |phone|
phone.dial("2125551212")
phone.dial("2125551212")
end
# Behavior based test using a stub
expect stub.to.receive(:dial).with("2125551212").times(2) do |phone|
phone.dial("2125551212")
phone.dial("2125551212")
end
# Behavior based test using a stub_everything
expect stub_everything.to.receive(:dial).with("2125551212").times(2) do |phone|
phone.dial("2125551212")
phone.dial("2125551212")
end
# Behavior based test on a concrete mock
expect Object.to.receive(:deal) do
Object.deal
endExpectations also supports asserting an exception will be thrown.
expect NoMethodError do
Object.no_method
endExpectations generally uses == internally to test equality; however, expectations also supports case equality (===) for Ranges, Regexps, and Modules.
# State based test matching a Regexp
expect /a string/ do
"a string"
end
# State based test checking if actual is in the expected Range
expect 1..5 do
3
end
# State based test to determine if the object is an instance of the module
expect Enumerable do
[]
endEvery expectation that uses case equality also uses regular equality, so the following tests also pass.
# State based test matching a Regexp
expect /a string/ do
/a string/
end
# State based test checking if actual is in the expected Range
expect 1..5 do
1..5
end
# State based test to determine if the object is an instance of the module
expect Enumerable do
Enumerable
endExpectations also introduces a fluent interface for asserting boolean values. The following tests verify that the attribute is set to true or false, based on the expectation definition.
# this is normally defined in the file specific to the class
klass = Class.new do
attr_accessor :started, :finished
end
# State based fluent interface boolean tests using to be
expect klass.new.not.to.have.started
expect klass.new.to.be.started do |process|
process.started = true
end
# State based fluent interface boolean test using to have
expect klass.new.not.to.have.finished
expect klass.new.to.have.finished do |process|
process.finished = true
endThe above example also shows expectations that exist without using a block. These are fully functional and can be run individually within TextMate (snippet for running individual expectations is in the README).
If you have opinions like mine, you may want to give expectations a shot. If you disagree, that's cool too, to each their own.
Labels: expectations, ruby, testing, unit testing


