Thursday, August 2, 2012

Correctly Dispatching on Generic Methods in `*apply()` in R

I've been hacking in R recently for work, and it is a bizarrely amazing
language. I just ran into a situation in which I needed to apply an S3 generic
function to a list (actually, a row-wise iterator on a data.frame)
and hit an annoying rough patch

For those not as well-versed in R (I've been learning it for about two weeks at
this point), "method dispatch" is done in a quirky yet effective way. You define
a generic function, say foo like this:



Which means "when foo() is called, dispatch to the function
matching the class of bar". Setting the class and calling the
generic method is simply:



Calling foo(bar) causes R to look for a function named
foo.myClass and calls that. If foo.myClass is not
defined, it will try to call foo.otherClass, then
foo.default and if none of those functions exist, it will throw an
error.

You might have noticed that UseMethod("foo") doesn't pass along the
arguments to foo. R passes the arguments along automagically.

If you want to apply a function to each element to a list, R makes it easy:



identity() and identical() do what you would expect.
There are a bunch of variants of apply; RTFM for the details.

Now let's say we want to call foo() within lapply()
rather than identity. There are some complications:



This throws an error because UseMethod() tries to dispatch on the
class of its first argument, which is a string
("character", in R parlance). One way to make this work is to
reverse the order of the arguments so the iteration variable comes first and
force UseMethod() to dispatch on a different argument:



Yuck! Wouldn't it be great if there was an easier way? Preferably one which
doesn't involve hacking R's method dispatch system? Well luckily, there is. We
take advantage of the fact that R can use named arguments:



It's a bit wonky when coming from normal languages, which wouldn't appreciate a
named argument being assigned to a variable earlier in its argument
list. Again, R is amazingly bizarre and bizarrely amazing. The explicit naming
of quux causes it to be "used up" from the argument list, so that
any remaining arguments are assigned to the "unclaimed" variables. Taking
advantage of this feature of R lets you apply a generic function to a list (or
other iterable R objects) without having to mess around with its
method-dispatching system.

Hopefully this spares a fellow hacker the pain of digging through R's
documentation on method dispatch.

No comments:

Post a Comment