Tuesday, May 15, 2012

Caching and The Way of the Object

Lately, we've had an internal debate about how to make some places where we do sorting go faster. Of course, there's always the caveat: Make it Work, Make it Right, and if you need to, Make it Fast. Learning when you need to care about that last step, is always a sort of art based on experience. Often the answer is simply "all other things considered, it's fast enough, I've got bigger problems to solve elsewhere."

Let's take an example though. Take a collection of Class objects (we'll leave the MetaClasses out), and sort them by their response to toolListDisplayString:

| classes |
classes := Object withAllSubclasses reject: #isMeta.
classes sorted: #toolListDisplayString ascending

In our example, we're using VisualWorks' ability to substitute simple Symbols as replacements for BlockClosures that send a single unary message to their arguments. It is equivalent in behavior to the more traditional:

| classes |
classes := Object withAllSubclasses reject: [:each | each isMeta].
classes sorted: [:each | each toolListDisplayString] ascending

Sending ascending to a BlockClosure before using it was first developed back in this post and this following post. And that was then integrated into VisualWorks 7.8 (or was it 7.8.1?).

The problem with our example, is that the method toolListDisplayString is not cheap. It's more than just concatenating strings for class and namespace names together. It looks at how much context needs to be added to the class name by itself to make it unique. Or put another way, since there are multiple classes in the system with the name Text, it determines it must add some info about the containing namespace, while the class PostgreSQLEXDIBLOBManipulationOutsideTransaction probably only has one instance and doesn't need any namespace info to contextualize it.

The core default sort algorithm in VisualWorks is hybridization of quicksort and a insertion sort. The implications for this, is that this somewhat expensive toolListDisplayString method may be called repeatedly for some objects. That means redundant CPU cycles.

A common solution to this kind of problem is memoization. Memoization basically is a fancy word which means "cache the results of your computation function, so you only evaluate the function once for each unique input and just look up the cached result for subsequent calls."

How to go about doing memoization around sorting sites can be accomplished a number of different ways.

In Place

The first and simplest way is to simply do it right at the sort site. We could rewrite our example to read:

| classes |
memory := IdentityDictionary new.
classes := Core.Object withAllSubclasses reject: [:each | each isMeta].
classes sorted: [:each | memory at: each ifAbsentPut: [each toolListDisplayString]] ascending

This is the simplest thing that could possibly work. That's its single advantage. The disadvantages is that adds a bit code bloat for every place we decide this is worth doing. It intermingles with what was otherwise pretty simple and easy to read. To flip back and forth between memoized and non-memoized is a pain. And it gets done again and again and again at each call site, so there's no real reuse involved. The risk of implementing it wrong is retaken at each implementation.

The desire to be able to easily flip back and forth between memoizing and not, shouldn't be underrated. Memoization is not free. It costs cycles. It is usually trial and error under conditions that the programmer knows to be common for his code, that determine if the overhead of memoizing is less than the cost of the original redundant function.

This technique is best for those that like to write more code. If you like to brag about how much code you've written, how many lines, classes, or methods, this might be for you. It's simple, and you can demonstrate your superior typing speeds.

More Sort Methods

Another approach is to add a new sort method. VisualWorks already has sort, sort:, sorted, sorted:, sortWith:, and probably some I've missed. Application developers tend to add one or two of their own. A common one in the past has been sortBy: which supports using a single arg block. So you figure out how many of these APIs you want to replicate as memoized alternatives and implement them, for example: memoizedSortBy:, etc. This is if you're a good citizen. If you're not so kind, you use something that looks like just another general purpose sorting API (e.g. sorting: aOneArgBlock).

Implementing memoizedSortBy: gives you the advantage of optimizing things a little differently. You can choose to build a parallel vector of objects collect:ing for the function, retaining index information, sort those, and then basically apply those indices to the original input set. Or you can just go with the Dictionary and at:ifAbsent: approach.

Now the only change we need to make to our call site is to change it to:

| classes |
memory := IdentityDictionary new.
classes := Core.Object withAllSubclasses reject: [:each | each isMeta].
classes memoizedSortBy: [:each | each toolListDisplayString]

You'll note that we don't have ascending anymore in there. The SortFunctions stuff is basically incompatible with this approach. Since this API wants to work with single arg blocks, which it's memoizing the results for, it has hard coded the sort direction inside of it.

I consider this the C Programmer's (or procedural) Approach. If at first you don't find a function, try, try, another one. That it is in this simplistic form incompatible with the SortFunctions thing, is personally aggrieving to me (we lose the elegance of setting the direction, as well as chaining functions, or deriving our own rocketship sorts). Another disappointment is that it's one more API I have to figure out which one I should use. I see a family of sort methods, and I've got to figure out (or recall) what the different nuances of each are (this one takes one arg, this one takes none, this one takes two, each has different trade offs, etc).

Finally, it limits the technique of memoization to sorting. What if I want to use memoization for collect:ing over a collection that I know has redundant elements. In that case, I have to go back to the In Place approach.

The Way of the Object

I'd rather take a page from the SortFunction technique. BlockClosures (or more generally, objects which respond to the message value: and fill the roles of functions) are real Objects too. And I'm declaring that they too have a right to be included in the General Love Fest of Polymorphism. The idea here, is that we add a new memoizing method to BlockClosure (and Symbol too so they can continue to stand double as simple BlockClosures). Sending memoizing to a BlockClosure returns a MemoizedFunction object which can do value: just like a BlockClosure. But it keeps a memory of evaluations and uses those when found. My first cut implementation is published as TAG-MemoizedFunctions in the Open Repository.

Now our example just turns in to:

| classes |
classes := Object withAllSubclasses reject: #isMeta.
classes sorted: #toolListDisplayString memoizing ascending

For this simplistic example, slapping memoizing in there is a 10x speed boost.

What do I like about this approach? First of all, it was fun. This kind of thing, to me, is where the zen of Object Oriented dispatch is at (I don't pretend to be brilliant about this at all, Wilf LaLonde probably wrote an article demonstrating this 20 years ago). I like that it is terse. I like that it localizes the decision about whether to memoize around the function itself rather than the API using it. This is the fastest/easiest way to toggle memoization on and off to profile the differences. I like that I can use it with collect:, or detect:, or allSatisfy:, or any method that makes use of first class function objects. And I like that it only took 10 methods and one class to do. Because Less is More.

Happy Memoizing!

(Why does Apple insist on constantly changing "memoizing" to read "memorizing"? Grumble...)


  1. Thanks, this is exactly the generic way to fix http://code.google.com/p/pharo/issues/detail?id=5880

  2. Neat, but no code! Here's my take: https://gist.github.com/2705526

  3. J, the code is published in the OR. I almost went and implemented yours. It's even simpler than mine. I like it. The only thing I don't like, and this is possible YAGNI, is that without a Real Object to represent the memorization you can't do real things with it. For example, you can't fetch a history of arguments from it. With a MemoizedFunction object you can have an argumentHistory that returns memory keys. And you can't further configure it. I could add configuration methods called byEquality and byIdentity to it, that tune the type of caching. Or you could put limits on how many entries to cache. If you ball it all up in they method like that, you lose that ability. But the Less is More sweetness of it sure is appealing.