Sunday, August 26, 2007

Ruby: Calling methods of a specific ancestor

Following the recent post, State pattern using Modules and Facets, Aman King asked:
what happens when two modules are included in a class[?] ... [will included modules overwrite] any methods that were included already by an earlier included module[?] (for the full comment please see the referenced post)
In Aman's comment he also points out that the Programming Ruby provides the following information.
If a module is included within a class definition, the module's constants, class variables, and instance methods are effectively bundled into an anonymous (and inaccessible) superclass for that class. In particular, objects of the class will respond to messages sent to the module's instance methods.
Part of the answer to Aman's question is in the statement from Programming Ruby. The way I think of it, each class can have zero or one superclass; however, each class may also have zero or many ancestors that are proxies to modules. You could simplify the previous statement and think of the modules themselves being an ancestor, but it can be important to note the difference because a change to a module will be reflected by all classes that include that module (even classes that included the module before the new behavior was added to the module).

Let's look at an example of a classes' ancestors.

module James
end

module Lynn
end

class FamilyMember
include James
include Lynn
end

FamilyMember.ancestors # => [FamilyMember, Lynn, James, Object, Kernel]
FamilyMember.superclass # => Object

The ancestors collection includes the class itself [FamilyMember], all included modules [Lynn, James, Kernel], and the superclass [Object]. The order of the ancestors collection is also important. The order of the ancestors is the order that the methods will be looked up when the object receives a message. Therefore, any message that is sent to a FamilyMember instance will first look in the methods of FamilyMember, then in Lynn, then James, etc.

If Lynn and James were to define a method, both of those methods would live on the proxies themselves, not on the FamilyMember class. Since the methods live on the proxies, including more modules will not overwrite a previous method definition; however, the last included module to define a method will be the first consulted when that message is sent. The module that was included last (and defines the method) will execute and return, and any other definitions of that method (found on other ancestors) will not be executed.

module James
def name
"James"
end
end

module Lynn
def name
"Lynn"
end
end

class FamilyMember
include James
include Lynn
end

FamilyMember.ancestors # => [FamilyMember, Lynn, James, Object, Kernel]
FamilyMember.new.name # => "Lynn"

So, given the above, how does Kernel.as allow me to call the methods of James even though Lynn clearly has precedence? Let's look at the implementation:

# comments removed for the example, check out the source of Ruby Facets to see the comments

module Kernel
def as(ancestor, &blk)
@__as ||= {}
unless r = @__as[ancestor]
r = (@__as[ancestor] = As.new(self, ancestor))
end
r.instance_eval(&blk) if block_given?
r
end
end

class As #:nodoc:
private *instance_methods.select { |m| m !~ /(^__|^\W|^binding$)/ }

def initialize(subject, ancestor)
@subject = subject
@ancestor = ancestor
end

def method_missing(sym, *args, &blk)
@ancestor.instance_method(sym).bind(@subject).call(*args,&blk)
end
end

For performance reasons (I assume), Kernel.as stores the As instance in a hash; however, for the purposes of our example the only thing worth noting is that Kernel.as returns an instance of the As class initialized with self and the ancestor. Generally, the As instance is returned and a method is immediately called on the As instance. If the As instance doesn't respond to the message it is sent, the method_missing method is invoked.

def method_missing(sym, *args, &blk)
@ancestor.instance_method(sym).bind(@subject).call(*args,&blk)
end

The above method_missing definition is what allows you to call a method on any ancestor. Let's start with an example and then walk through the method_missing definition to see how it works.

require 'rubygems'
require 'facets'

module James
def name
"James"
end
end

module Lynn
def name
"Lynn"
end
end

class FamilyMember
include James
include Lynn
end

FamilyMember.ancestors # => [FamilyMember, Lynn, James, Object, Kernel]
member = FamilyMember.new
member.name # => "Lynn"
member.as(James).name # => "James"

In the above example the member instance receives the message as which returns an instance of As initialized with the member instance and the module James (as the ancestor). Following the return of the As instance, it receives the name message. Since the As instance doesn't define name, method_missing is called passing in :name as the first argument (sym). Within method_missing, the ancestor (James) receives the message instance_method with the sym (:name) as the argument. The instance_method method will return the unbound method name from the ancestor (James). Next, method_missing binds the unbound method (name) to the subject (the member instance) and sends the call message (with arguments, which are empty in our example). When the unbound method executes bound to the subject it can access any state or behavior of the subject. In our example, the method merely returns "James"; however, the example from State pattern using Modules and Facets verifies that a method from the subject may be called from the unbound method when it is bound to the subject.

6 comments:

  1. I came across this post from PlanetRubyOnRails. There are 2 problems with your caching of As instances. First, that leaks memory because any instance that you use #as on will be forever stored in @subject of the cached instance. Secondly, if you use #as(James) on two separate instances of FamilyMember (or two separate class that include James) you unexpectedly get an As with the @subject being the instance of the first time it was used.

    What I have used in these situation is a call_as method that I implement as such:

    module Kernel
    def call_as(ancestor, name, *args, &block)
    ancestor.instance_method(name).bind(self).call(*args, &block)
    end
    end

    You don't get an object instance that behaves like an instance of ancestor, but with As you don't either so I don't know if anything is lost.

    Anyway, just my thoughts. It is a good problem to address.

    ReplyDelete
  2. Along those same lines, the 'use' library may interest you:

    http://rubyforge.org/docman/view.php/735/309/readme.html

    It allows you to selectively mixin methods from a module (instead of all of them), and alias them on the fly to avoid conflicts.

    ReplyDelete
  3. Ok that URL should be:

    http://rubyforge.org/docman/view.php/
    735/309/readme.html

    ReplyDelete
  4. Yes, the as cache was put there to imporve performace. But Jeremy points out some concerns. I'm not sure one can ssay it leaks memory. Yes it takes up memory --any instance that uses #as will have an As sahdow, so to speak. That really the hold deal of cahcing, safricifin memory space for performance. However, the question of which is more important is debatable.

    As for the second point. I'm not sure that's right. If their are two separate instances there will be two separate instances of As also.

    Please correct me if I'm wrong though. I want Facets to be as perfect as possible.

    Note, Facets also has "call_as" method, but it's called send_as.

    ReplyDelete
  5. Sorry for all the typos --I'm in a hurry. But you get my meaning (I hope)

    ReplyDelete
  6. @jeremy Those are not class variables being accessed in Kernel#as, they are instance variables. Meaning both of your concerns are unfounded. Remember, Kernel is an ancestor of every Object, so #as is called on the instance of your object.

    Memory will not be leaked because instance variables go away when your instance does, and if you use #as(James) on two separate instances of FamilyMember, you will access different @__as variables, so there is no risk of contamination.

    ReplyDelete

Note: Only a member of this blog may post a comment.