Tuesday, September 11, 2007

Ruby: Array#chunk

Yesterday I was working on some code that needed to split an array into 3 different chunks. My pair, Philippe Hanrigou, wrote tests similar to the following.

unit_tests do
test "chunk evenly" do
assert_equal [[1], [2], [3]], [1, 2, 3].chunk(3)
end

test "divide evenly" do
assert_equal [[1], [2], [3]], [1, 2, 3] / 3
end

test "chunk unevenly" do
assert_equal [[1, 3], [2]], [1, 2, 3].chunk(2)
end

test "chunk creates empty array when there aren't enough elements in the array" do
assert_equal [[1], [], []], [1].chunk(3)
end
end

To make the above tests pass, I wrote the following code.

class Array
def chunk(number_of_chunks)
chunks = (1..number_of_chunks).collect { [] }
while self.any?
chunks.each do |a_chunk|
a_chunk << self.shift if self.any?
end
end
chunks
end
alias / chunk
end

The above code works, but I'm curious to see if there is a better solution. How would you make the tests pass?

Here's the code in it's entirety.

require 'rubygems'
require 'test/unit'
require 'dust'

class Array
def chunk(number_of_chunks)
chunks = (1..number_of_chunks).collect { [] }
while self.any?
chunks.each do |a_chunk|
a_chunk << self.shift if self.any?
end
end
chunks
end
alias / chunk
end

unit_tests do
test "chunk evenly" do
assert_equal [[1], [2], [3]], [1, 2, 3].chunk(3)
end

test "divide evenly" do
assert_equal [[1], [2], [3]], [1, 2, 3] / 3
end

test "chunk unevenly" do
assert_equal [[1, 3], [2]], [1, 2, 3].chunk(2)
end

test "chunk creates empty array when there aren't enough elements in the array" do
assert_equal [[1], [], []], [1].chunk(3)
end
end

18 comments:

  1. Anonymous9:06 AM

    Your sidebar goes strange when the browser window is narrow (less than about 930 pixels) and your source view has descenders cut off the letters. Tested in Firefox 2.0.0.6 and IE7, both on XP.

    ReplyDelete
  2. I would have used something like this:
    size_of_chunks = self.size/number_of_chunks
    self.enum_for(:each_slice, size_of_chunks).to_a

    That doesn't really follow the semantics you outlined, of course, but it's still pretty nice.

    ReplyDelete
  3. Anonymous9:40 AM

    Instead of:

      number_of_chunks = (1..number_of_chunks).collect { [] }

    I would have written:

      chunks = Array.new(number_of_chunks, [])

    Secondly, when you go through your array using "shift" you end up removing all of its contents; is that what you really meant to do? If so, I think you should have called your method "chunk!" instead of "chunk", and your tests should be specifying that that is the desired behaviour.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Nice exercise. I learned a little Ruby. Here's my entry.


    class Array
      def chunk(number_of_chunks)
        chunks = (1..number_of_chunks).collect { [] }
        self.each_with_index do |item, index|
          chunks[index % number_of_chunks] << item
        end
      chunks
      end

      alias / chunk

    end


    Like Wincent, I noticed that the chunk method modifies the array, so it should be named chunk!. This version doesn't modify the array.

    Wincent, you can't use:

      chunks = Array.new(number_of_chunks, [])

    ...because the array with have number_of_chunks references to the same instance of the array. Instead, you could use:

      chunks = Array.new(number_of_chunks) {|i| []}

    ReplyDelete
  6. Greetings,
    I got mentally stuck on non-code-related issues.

    The 'divided by' semantics are different than I'd have intuitively guessed. I'd think that an array of length 10, when divided by 2, would yield an array of 5 subarrays. (I.e. the denominator is the maximum subarray size you want to split the larger array into.) It doesn't help that ([1,2,3] / 2) yields the same result in both. I imagine that the chunking semantics you define were necessary, it just took me a few moments to realize that wasn't what you were doing.

    There's also a syntax error in the post; the while has an extraneous do, and you fubar the parameter, and return number_of_chunks which doesn't show your intention very well.

    Your tests indicate [[1, 3], [2]] is an expected result of [1, 2, 3].chunk 2, which is pretty weird to me also...

    Ignoring all that, I think Jeremy's is the cleanest answer.

    I wrote a cutesy (meaning it also isn't clear on its intentions) approach using mainly ranges.

    class Array
    def chunk(chunk_count)
    chunks = (0...chunk_count).collect {[]}
    (0...length).each {|i| chunks[i % chunk_count][i / chunk_count] = self[i] }
    chunks
    end

    alias / chunk
    end

    -- Morgan

    ReplyDelete
  7. Greetings,
    I note that [1, 2, 3] / 2 yields the same result in my mental model and yours, but that was before I caught the [[1, 3], [2]] result, which as I said was also not intuitive to me...

    I also don't know how to format code here, as I'm not sure the code tag will work.

    -- Morgan

    ReplyDelete
  8. Here's another approach. It passes the tests, and doesn't alter the original array.

    require 'enumerator'

    class Array

    def chunk(number_of_chunks)

    enum_for(:each_slice, number_of_chunks).map { |sl|

    while sl.length < number_of_chunks

    sl << place_holder

    end

    sl

    }.transpose.map {|ch| ch.reject{|ea| ea == place_holder}}

    end

    alias / chunk

    end


    It's a little more than what's needed just to get the tests to pass. I could have used nil instead of the place_holder object and compact instead of the last reject, but this would fail if the original array contained nils which were to be preserved.

    Not sure I'd actually USE this but...

    ReplyDelete
  9. Anonymous2:11 PM

    Here is a more functional approach (but not efficient by any means):

    require "enumerator"

    class Array
    def chunk(number_of_chunks)
    (0...number_of_chunks).map do |chunk|
    self.enum_with_index.select do |_,index|
    index % number_of_chunks == chunk
    end.map { |e| e.first }
    end
    end
    alias / chunk
    end

    ReplyDelete
  10. Anonymous2:15 PM

    Morgan,

    I agree that assigning to the number_of_chunks variable is a bad idea. That was a mistake. However, the 'do' after the 'while' isn't required, but it also isn't a typo. I've updated the example.

    Cheers, Jay

    ReplyDelete
  11. one more:

    require "enumerator"

    class Array
    def chunk(number_of_chunks)
    self.enum_with_index.inject(Array.new(number_of_chunks){[]}) do |mem, var|
    mem[var[1] % number_of_chunks] << var[0]
    mem
    end
    end
    alias / chunk
    end

    ReplyDelete
  12. Greetings,
    Jay: That's *really* weird. Including the 'do' gives a syntax error in IRB (0.9.5, with Ruby 1.8.[56]), but putting it into a file works just fine. :(

    That's the first time I've seen something like that...

    -- Morgan

    ReplyDelete
  13. http://pastie.caboo.se/96316

    ReplyDelete
  14. def chunk(number_of_chunks)
    require "active_support"
    groups = in_groups_of((size.to_f / number_of_chunks).ceil, false)
    groups << [] while groups.size < number_of_chunks
    groups
    end

    I had to change the ordering in the test "chunk unevenly"

    ReplyDelete
  15. I added a chunk_into method and an alias for % (mod).

    Here's the pastie: http://pastie.caboo.se/134469

    ReplyDelete
  16. I had the same need for Array.chunk but didn't want it to chunk "evenly". I want sequential chunks.

    Here is my own submission:

    http://pastie.org/269020

    This will do:

    >> x = [1,2,3,4,5,6,7,8]
    >> x.chunk(4)
    => [[1, 2], [3, 4], [5, 6], [7, 8]]
    >> x.chunk(3)
    => [[1, 2, 3], [4, 5, 6], [7, 8]]

    I suppose an option to add the "leftover pieces" to the end rather than the beginning would be nice.

    ReplyDelete
  17. In newer version of activesupport, this come built in with method in_groups (http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/Array/Grouping.html)

    ReplyDelete
  18. Mark S12:51 AM

    class Array
    def chunks(num)
    chunks! num, self.dup
    end

    def chunks!(num, array = nil)
    old_array = array || self
    min_size = old_array.size / num
    rem = old_array.size.remainder(num)
    new_array = []
    num.times do new_array << old_array.shift(min_size + ((rem = rem - 1) > -1 ? 1 : 0)) end
    new_array
    end
    end

    ReplyDelete

Note: Only a member of this blog may post a comment.