Monday, December 31, 2007

Refactoring Motivations

Aman King recently asked me
[W]hat have your experiences been on deferring refactoring and accruing technical debt?
My short answer: As long as you remember that the goal is working software, not beautiful code, I believe you will be able to pragmatically balance time spent refactoring and time spent implementing features.

I believe there are many reasons that developers choose to refactor code. Understanding someone's motivation for refactoring may be helpful in determining if the refactoring is helpful to the project. This entry will focus on why developers choose to refactor and the consequences of refactoring.

Refactoring for Greater Understanding (aka, Refactor to the same thing)
A senior developer once joined a team I was leading, half way through the project. When he joined he saw things that he didn't agree with and suggested that we refactor the code towards a better domain model. Anxious to learn from the senior developer, I paired with him over the next few days while we made various changes to the domain model. Unfortunately, many of the changes that the senior developer suggested could not be implemented due to additional constraints imposed by required features. In the end, the code was refactored to be slightly better; however, the largest benefit was the deep understanding that the senior developer gained from the refactoring. From that point forward he delivered value at the level you would expect from a team member who has been on the project from day one. The project lost 2 development days towards new features; however, it gained a fully productive senior developer only 2 days after joining the project. That developer's contribution in the following months greatly out-weighed the original slow down.

I see Refactoring for Greater Understanding fairly often; however, I don't think it's a bad thing. When developers have a deeper understanding of the codebase they can be more effective at adding to it and suggesting how to improve it.

Refactoring while Implementing New Features
Refactoring while Implementing New Features is where developers need to start thinking in terms of Return On Investment (ROI).

If it takes you 3 days to implement a feature or 1 day to refactor and 1 day to implement the feature, obviously you should refactor and save a day. However, what if it takes you 3 days to implement a feature or 2 days to refactor and 2 days to implement the feature? This is when the context is important to consider. Early in a project you should likely go ahead and do the refactoring. Chances are you will need to touch that same code in the future and you will gain that day of effort back while implementing subsequent features. Conversely, if you are nearing the end of a project and you will not be touching that part of the codebase again until the following release, it may make sense to defer the refactoring until after the upcoming release. Again, this is where difficult questions come in to play: Are we really near a release, or is the release going to be pushed back, thus increasing the likelihood that you will end up adding additional features in the same area?

If there's little difference between the time it takes to implement a feature and the time it takes to refactor and implement, it's almost always the right decision to go ahead and refactor. Accruing technical debt can destroy velocity* in the long term. As a project continues accruing technical debt features will take longer to implement. If the team then decides to address the technical debt velocity will suffer even greater losses. In the end, addressing the technical debt is the proper decision; however, it will likely delay the original project completion date. A likely better path is to be vigilant about addressing technical debt whenever it is pragmatic.

Refactoring to New Ideas
Sometimes a new framework is released or a new technique is found that may replace a portion of your application. Developers are often eager to both remove existing pain points and experiment with new solutions. Refactoring to New Ideas also needs ROI consideration; however, there is often hidden ROI. For example, replacing a section of your code with a framework means there is less code for the existing team and new members to understand. Of course, this must be weighed with the fact that the framework likely isn't bullet proof. However, when using a framework you can not only utilize your team to diagnose problems, you can also utilize the community that uses the framework. Another hidden ROI for utilizing new frameworks or ideas is that you may fail when attempting to put it in your codebase; however, failure is often as important as success. If you never try the framework (or technique) you will never know where it applies and where it doesn't. Today's failure may result in a deeper understanding of the framework that may lead to a great gain in the future when it is utilized in a successful way.

Refactoring for Academic Purposes
Refactoring for Academic Purposes is in direct conflict with delivering working software. In your career you will likely find many lines of code that you do not agree with; however, disagreeing with implementation is not a good enough reason to refactor code. If the code currently hinders your ability to deliver software (or will in the future), you can refactor, but changing code because you philosophically disagree is simply wrong. For example, if you believe that state based testing is the only way to test, that isn't a good enough reason to alter the existing tests that utilize mocks. If those tests become a maintenance problem, that's another issue, but simply disliking mocks does not give you the right to remove them. Creating a beautiful codebase should always be a priority; however, creating working software is the number one priority. To make matters worse, "too much" refactoring generally upsets the business sponsors and project managers. Refactoring is a good thing and everyone should be on board with it. If you can't prove to the business and the project manager that a refactoring is worth doing, you might be Refactoring for Academic Purposes.

* Agile velocity is the rate at which the team has accomplished work in the past, which is about the only thing you can use (except for prayer) to estimate the rate at which they will accomplish work in the future. But it's an estimate only, not a promise. -- JerryWeinberg 2005.06.07

Ruby: arbs 0.1.0 released

Rails 2.0 introduces new syntax for migrations. Since the schema.rb file is generated using the new syntax and arbs reads the schema file to create the AR::B ducks, arbs needed to be updated.

The arbs 0.1.0 version provides support for the new migrations and should be a drop in replacement for previous versions of arbs.

Tuesday, December 25, 2007

Ruby: expectations gem

In February I wrote about removing test noise. 10 months later, I finally took the time to write the unit testing framework I've been wanting for the past year: expectations

expectations is a lightweight unit testing framework. Tests (expectations) can be written as follows
  expect 2 do
1 + 1
end

expect NoMethodError do
Object.invalid_method_call
end.
expectations is designed to encourage unit testing best practices such as
  • discourage setting more than one expectation at a time
  • promote maintainability by not providing a setup or teardown method
  • provide one syntax for setting up state based or behavior based expectation
  • focus on readability by providing no mechanism for describing an expectation other than the code in the expectation. Since there is no description, hopefully the programmer will be encouraged to write the most readable code possible.
A few things probably come to mind right away: sometimes setup is good, sometimes a description is a good thing, sometimes creating an object is so painful that I need to be able to add multiple assertions. All of those things are true, but generally I don't run into those situations while unit testing. Those situations generally pop up with functional testing, and while functional testing you are better off using RSpec or Test::Unit.

The more I write tests the more I believe that there doesn't need to be a silver bullet testing framework. I think expectations is a good solution for unit testing, and I think RSpec or Test::Unit are good solutions for functional testing, and I plan to use both expectations and RSpec on all my projects moving forward.

I'm currently testing several of my projects using expectations, so I do believe it's in decent shape for using, but it is still very new. As always, patches are welcome.

Mocking is done using Mocha

Here's a few more examples of how easy it is to test with expectations

Expectations do

# State based expectation where a value equals another value
expect 2 do
1 + 1
end

# State based expectation where an exception is expected. Simply expect the Class of the intended exception
expect NoMethodError do
Object.no_method
end

# Behavior based test using a traditional mock
expect mock.to.receive(:dial).with("2125551212").times(2) do |phone|
phone.dial("2125551212")
phone.dial("2125551212")
end

# Behavior based test on a concrete mock
expect Object.to.receive(:deal).with(1) do
Object.deal(1)
end

end

Thursday, December 20, 2007

Avoiding costly typos

Typos are generally unfortunate, but greatly upsetting when they cost you a few hours of your life. I have a few rules I try to follow in an attempt to conserve those valuable hours in the future.

Ruby has symbols. I love symbols. But, using symbols for comparison is an easy way for a typo to cost you time.

# example 1
name = :shane
name == :shane # => true
name == :chane # => false

# example 2
class Name
def self.shane
:shane
end
end

name = Name.shane
name == Name.shane # => true
name == Name.chane # => undefined method `chane' for Name:Class (NoMethodError)

Above, in example one, a typo simply returns false. However, in example two the typo gives me the immediate feedback that I made a mistake. You could argue that I should simply use a constant. Elephant case (All upper case) words bother me for some reason, but if you prefer constants that's cool too, you'll get the same benefit.

The method_missing method is dynamite. Used appropriately it's a powerful tool. However, there are often times when you simply don't need dynamite.

# example 3
class State < Struct.new(:state)
def method_missing(sym, *args)
sym.to_s.delete("?") == self.state
end
end

State.new("ready").ready? # => true
State.new("ready").reddy? # => false

# example 4
class State < Struct.new(:state)
def method_missing(sym, *args)
sym.to_s.delete("?") == self.stat
end
end

State.new("ready").ready? # ~> -:12:in `method_missing': stack level too deep (SystemStackError)

# example 5
class State < Struct.new(:state)
[:ready, :running, :finished].each do |element|
define_method :"#{element}?" do
self.state == element.to_s
end
end
end

State.new("ready").ready? # => true
State.new("ready").reddy? # ~> -:10: undefined method `reddy?' for #<struct State state="ready"> (NoMethodError)


Examples three and four illustrate the two common typos that can cost you time while utilizing method_missing. Example five shows an alternative that requires slightly more code, but is significantly better at letting you know when you've made a mistake.

If right now you are thinking "that's nice, but I write tests so I'll catch it there" then you get points for writing tests, but you missed one important note: If the typo is in your test you could be getting a false positive. I once found a bug where the same typo existed in a class and the test for the class. The result was broken production code and a green test suite.

The last tip builds on the first two: Don't use strings for comparison. As an alternative to using constants or class methods, you could define methods on a string (or a symbol) to query for the value.

RAILS_ENV = "development"
class << RAILS_ENV
["development", "test", "production"].each do |environment|
define_method :"#{environment}?" do
self == environment
end
end
end

RAILS_ENV.test? # => false
RAILS_ENV.development? # => true
RAILS_ENV.developmant? # ~> -:12: undefined method `developmant?' for "development":String (NoMethodError)

The last example is nice because it allows you to type less when doing a comparison and provides you better feedback if you do make a typo. (drop a +1 on this ticket if you want this feature in Rails core: http://dev.rubyonrails.org/ticket/10583)

Wednesday, December 19, 2007

Using patch as a subversion stash

Being a consultant, I'm generally at the mercy of whatever source control system my current client is using. Luckily, Subversion has been the version control choice for every client I've worked for in the past 3 years.

A friend of mine, Kurt Schrader, recently posted The Power of Git: git-stash. I haven't gotten a chance to use git, but I am planning to give it a shot in the near future. However, I don't expect to being using git for daily development any time soon.

I've definitely needed a git-stash in the past. It's helpful for fixing a bug without committing your current changes, but it's also helpful if I ever get to work and I end up on a pairing station that has someone else's uncommitted changes.

In those cases I create a patch, revert the changes and move on.

In case you're unfamiliar with this type of thing, here's all you'll need to do (assuming you have patch available).

Creating the patch: svn diff > patch_name.patch

note: You'll want to add any new files before creating the patch, if you want them included in the patch.

Once you've created the patch you can revert everything and start fresh (svn revert -R . will recursively revert). You may also need to delete any new files that were created as part of the uncommitted changes (Paul Gross has a one liner for removing uncommitted files).

When you are ready to get your changes back you'll need to apply the patch that was previously created.

Applying a patch: patch -p0 < patch_name.patch

That's basically it for decent stash capabilities with Subversion, but there is one gotcha: patch will not capture Subversion metadata changes. Usually this isn't a problem, but it's always a good idea to look out for this situation when you create a patch.

Tuesday, December 11, 2007

Rails: Route Globbing

Today I was working with some code that utilized the same array on various pages. After evaluating a few options I decided that Rails Route Globbing was probably the best solution for passing the array of integers from page to page.

I originally learned about Route Globbing from David Black's Rails Routing. Route Globbing works by grabbing everything past a certain point in a url and storing the elements in an object that behaves like an Array.

For example, the following route sets the user_id and feed_ids parameters based on the url.

map.specific_feeds '/users/:user_id/feeds/*feed_ids', :controller => 'feeds', :action => 'index'

# navigating to http://a.domain.com/users/23/feeds/2/24/55/89 will result in
# params[:user_id].inspect # => "23"
# params[:feed_ids].inspect # => ["2", "24", "55", "89"]

The above behavior is nice for grabbing values, but it's also useful for passing on the values of an array when creating links and forms.

<!-- in a view, given the above route you can create a link with -->
<%= link_to "next", specific_feeds_url(params[:user_id], params[:feed_ids]) %>

<!-- or you can create a form with -->
<% form_tag specific_feeds_url(params[:user_id], params[:feed_ids]) do %>
<% end %>

Thankfully, Route Globbing usually just works, but there is one gotcha for Rails <2.0: The glob must appear at the end of the route. Once an asterisk is used everything after must be part of the glob.

# No Good
map.specific_feeds '/feeds/*feed_ids/users/:user_id', :controller => 'feeds', :action => 'index'

# Good
map.specific_feeds '/users/:user_id/feeds/*feed_ids', :controller => 'feeds', :action => 'index'


As Brandon points out in the comments, Rails 2.0 allows you to put the glob anywhere in the route.

Sunday, December 09, 2007

Is being a niche language developer good for your career?

Paul Graham originally introduced me to the idea that programming in a niche language was a good idea. He makes great arguments for organizations adopting more powerful languages and hiring smaller teams of highly effective programmers that can harness the power of niche languages. Paul Graham made me feel good about working exclusively on Ruby projects.

Later I read (the fantastic) My Job Went to India by Chad Fowler. Chad also talks about how being proficient in a niche language can make you a more valuable asset. Again, I felt good about focusing so heavily on Ruby.

February marks 2 years of nothing but Ruby projects. I'm still happy with the decision, but choosing Ruby has certainly brought unexpected consequences. I've gained an enormous amount from my recent experience; however, this entry serves to highlight the sacrifices that are required when focusing on a niche language.

One undeniable drawback to focusing on a niche language is that your employer choice becomes severely limited. For example, there are Ruby jobs available in New York City; however, I only know of one organization in the entire city that would be a good fit for me personally if I decided to leave ThoughtWorks*. NYC isn't exactly small either, if I moved to a smaller town the chances are even higher that I would have a hard time finding an employer with values similar to mine and still be able to make a living as a Ruby programmer.

A great lesson from My Job Went to India is that you should strive to be the worst member of a band. If you haven't read the book (read it soon!), the lesson is essentially that you perform at a higher level when you work with others who are performing at a higher level. I agree with Chad's advice; however, it's harder to be the worst member of a band if there are limited numbers of musicians you can work with. Some of the greatest programmers I know are getting paid to deliver Ruby applications, but I still know significantly more great developers who are not. Sadly, when focusing on a niche language there are simply fewer mentors available to learn from.

Another constant pain point for niche languages is the lack of respect that they are given. In the enterprise world, Ruby is largely seen as a toy. As a result, Ruby programmers are seen as "hackers, script kiddies, immature and uneducated." I do believe this to be far from the truth, but that doesn't change the fact that it is the general attitude. This leads to unfriendly treatment at conferences, and worse it can also cause an interviewer to unfairly discredit a qualified candidate.

Niche languages also suffer from greater prejudice. No matter how many success stories emerge, it's still very common to hear that "Ruby doesn't scale, just look at Twitter". There's very little truth in the statement, and yet it still follows Ruby every where Ruby goes. Niche languages simply have more detractors and thus the propaganda emitted by the detractors reaches larger audiences.

Adoption leads to patterns and best practices. Obviously, greater adoption can lead to larger amounts patterns and best practices and lower adoption rates have less to offer in the way of established ideas. While developing an application with a niche language, it's not uncommon to be the first person to tackle a new problem. Blazing a new trail is often exciting, but it's rarely as efficient as following a proven path.

Vendor support is also often limited for niche languages. My first 3 Ruby projects were backed by an Oracle database. At the time, the Oracle driver for Ruby on Rails wasn't exactly what I'd consider to be stable. The driver has improved, but using Oracle is still a pain to this day. Currently, TextMate is the #1 choice for developers writing Ruby on Rails applications. TextMate runs exclusively on the Mac. Oracle not only does not run on (Intel) Macs, but doesn't even offer a (reasonable) way to connect to an Oracle database from a Mac. Therefore, it's (basically) impossible to run TextMate and connect to an Oracle database from the same computer.

To be clear, I'm very, very happy with my decision and the past 2 years of projects. I won't bother with writing up the merits of adopting a niche language, see Chad Fowler and Paul Graham's work for that. However, I do think it's important to note that adopting a niche language can carry some unfortunate consequences.

*I'm not leaving ThoughtWorks. I'm very happy at ThoughtWorks.

Saturday, December 08, 2007

Advanced Rails Recipes

I loved Rails Recipes, so when Mike Clark and Chad Fowler asked me to contribute to Advanced Rails Recipes I was quite excited about the opportunity. My contribution to the book was a write up of the Presenter Pattern. Advanced Rails Recipes is now available as a beta book at pragprog.com.

Friday, December 07, 2007

When not to pair

Last night, while talking with Mike Roberts and Martin Fowler the topic of Pair Programming came up. We are all proponents of Pair Programming, but inevitably the question came up concerning when Pair Programming isn't recommended. I've always liked the suggestion that any activity that requires an artist can be done more accurately/effectively while pairing.

It's not uncommon to hear that developing software is an art. This is a general statement that is often true, but it's the tasks that don't require an artist that I believe can be done without pairing. What tasks don't require an artist? I would say any task that has only one right answer. For example, the task "Table Users needs an index on the Login column" likely only has one correct implementation. Adding an index can be done a few different ways, but I expect a project has a convention on altering tables and adding the index in that way is the correct execution of the task.

I do believe tasks that have only one correct answer can be done without a pair, the tricky bit is finding tasks that have only one correct answer.

Wednesday, December 05, 2007

Utilizing Ruby and Linux

I've recently been working on a site that aggregates a large amount of data. The data is made available at arbitrary times and delivered via xml feeds over the web. I needed to check for new content at timed intervals and I needed a mechanism for delivering a large amount of data quickly.

The first problem was easily solved by writing a ruby script that checks all the sources for new content and setting up cron to run the script every 15 minutes.

Making large amounts of data quickly available was solved by using Rails page caching, but the first request was still taking about a minute to serve. That issue was also easily solved by sending a curl request after each time the cache is swept.

One of the reasons that I love working with Ruby/Rails* is that not only do I have all the tools Ruby provides, but I also have easy access to the tools available on linux.

Next time someone asks me why I like Ruby more than other languages, I'll have to remember to add this to the list of reasons.

*Most languages give access to the underlying OS, but I find Ruby's access to Linux to be more pleasurable than other experiences in my past.

Monday, December 03, 2007

ThoughtWorks DSL Podcast

I recently participated in a podcast with Martin Fowler, Rebecca Parsons and Neal Ford on the topic of Domain Specific Languages. The podcast is being released as part of the ThoughtWorks IT Matters series and is available at ThoughtWorks: What we say.

Feedback welcome.

Tuesday, November 13, 2007

Ruby: Time::is

Mocha is fantastic for unit testing, but I usually try to avoid requiring it while functional testing. In general this works, but Time.now is something that I occasionally like to fix even while functional testing. To solve the problem, within a functional test helper I load a time_extensions.rb file that defines a Time::is method. The Time::is method is useful for freezing time at a certain point and executing a block of code. When the block of code finishes the Time.now method is returned to it's original implementation.

The example below is how I usually solve the described problem.

require 'time'

class Time
def self.metaclass
class << self; self; end
end

def self.is(point_in_time)
new_time = case point_in_time
when String then Time.parse(point_in_time)
when Time then point_in_time
else raise ArgumentError.new("argument should be a string or time instance")
end
class << self
alias old_now now
end
metaclass.class_eval do
define_method :now do
new_time
end
end
yield
class << self
alias now old_now
undef old_now
end
end
end

Time.is(Time.now) do
Time.now # => Tue Nov 13 19:31:46 -0500 2007
sleep 2
Time.now # => Tue Nov 13 19:31:46 -0500 2007
end

Time.is("10/05/2006") do
Time.now # => Thu Oct 05 00:00:00 -0400 2006
sleep 2
Time.now # => Thu Oct 05 00:00:00 -0400 2006
end

Rails: Enumerable#sum

Documentation
Calculates a sum from the elements. Examples:

payments.sum { |p| p.price * p.tax_rate }
payments.sum(&:price)

This is instead of payments.inject { |sum, p| sum + p.price }

Also calculates sums without the use of a block:

[5, 15, 10].sum # => 30

The default identity (sum of an empty list) is zero. However, you can override this default:

[].sum(Payment.new(0)) { |i| i.amount } # => Payment.new(0)
Usage
The Enumerable#sum method does exactly what you would expect: Sum the elements of the array.

Test

require 'rubygems'
require 'active_support'
require 'test/unit'
require 'dust'

unit_tests do
test "sum the numbers from the array" do
grades = [50, 55, 67, 62, 71, 89, 84, 85, 99]
assert_equal 662, grades.sum
end
end

Monday, November 12, 2007

Rails: Enumerable#group_by

Documentation
Collect an enumerable into sets, grouped by the result of a block. Useful, for example, for grouping records by date.

e.g.
  latest_transcripts.group_by(&:day).each do |day, transcripts|
p "#{day} -> #{transcripts.map(&:class) * ', '}"
end
"2006-03-01 -> Transcript"
"2006-02-28 -> Transcript"
"2006-02-27 -> Transcript, Transcript"
"2006-02-26 -> Transcript, Transcript"
"2006-02-25 -> Transcript"
"2006-02-24 -> Transcript, Transcript"
"2006-02-23 -> Transcript"
Usage
The Enumerable#group_by method is helpful for grouping elements of an Enumerable by an attribute or an arbitrary grouping. The documentation provides a good example of how to group by an attribute; however, the group_by method can be used logically group by anything returned from the block given to group_by.

Test

require 'rubygems'
require 'active_support'
require 'test/unit'
require 'dust'

unit_tests do
test "group by grades" do
grades = [50, 55, 60, 62, 71, 83, 84, 85, 99]
expected = {"A"=>[99], "B"=>[83, 84, 85], "C"=>[71], "D"=>[60, 62], "F"=>[50, 55]}
actual = grades.group_by do |grade|
case
when grade < 60 then "F"
when grade < 70 then "D"
when grade < 80 then "C"
when grade < 90 then "B"
else "A"
end
end
assert_equal expected, actual
end
end