As promised, here's one of those "tying up loose ends" things.
A couple months ago, I posted a prototype of some code inspired by JQuery like behavior for VisualWorks view trees. It got a variety of feedback. A lot positive, some skeptical, some mixed.
Recently, I've been working on modifying the newer VisualWorks comparison tool that presents changes in a disclosure/rollup navigation style. I've been working to put filters in it, so you can filter out different kinds of changes (e.g. hide all the category changes, or show just the additions).
I found myself wanting that JQuery like behavior to have lightweight communication between view tree objects again. I did not want to fabricate a model with dependencies just to facilitate some cross talk. So I took a "go around #2" at the idea. This time, no funky syntax. More based on real world needs. Definitely lighter weight.
It's been published as QueryTwo to the Open Repository. What will come of it? I don't know at this point. Maybe it'll get integrated, maybe not, that's up to others to decide now.
Here is an example of me using it in real life (instead of hypothetical examples).
<pre><code>
propogateChanges
(self query)
top;
type: AbstractComparisonRollupView;
do: [:view | view hideTypes: hiddenTypes].
self updateCellFills
</code></pre>
It reminds me a lot of writing Glorp queries. Similar patterns, you create one, send messages to configure it, and then enumerate it. Or kind of like Seaside html writer pattern too. Make one, configure it, execute a block for it.
What follows is a portion of the class comment that describes usage:
The most common case is to ask a VisualPart or Window to create one for you using
self query
This will return a query that has the receiver as the root of its query path. One can send top to the query to shift the root of the query to the top most element of the view tree (e.g. the window at the top).
You can also ask an ApplicationModel to create one
myApplicationModel viewQuery
The pattern is that after creating a query, one sends configurations messages to it, and then invokes various collection methods (e.g. do:). The enumeration methods should be sent after configuration methods. There are couple of different methods that govern which objects in the view tree are traversed, they come in pairs:
Traversal Configuration Messages
up - causes the query to proceed upwards through the parent path from the root object
down - causes the query to proceed downwards through the children of the root object (this is the defaut if neither up nor down is sent to configure the query)
withRoot - causes the query to include the root in its traversal
withoutRoot - causes only parents or children (as configured by up/down) to be traversed (this is the default if neither withRoot or withoutRoot is sent to configure the query)
one - causes the query to cease traversal after the first match is found and enumerated
many - causes the query to traverse all elements, matching as many as encountered that match (this is the default if neither one or many or sent to configure the query)
Queries
Adding queries controls which elements of the traversal are "matched" and thus show up in things like do: enumerations. By default the query will match everything. Methods found in the queries method category provide utility methods for setting up some common queries. Ultimately, they all pass through the addQuery: method. The argument to the this method is a block, which is cull:ed for every element in the traversal, and for those that answer true to this block, they will be enumerated. Repeated query configuration messages, will AND the queries together. The method reset will return the query to its default [true] state.
Examples
(self query id: #foobar) any
"will return the first element with id of #foobar"
(self query top property: #frame satisfies: [:rect | rect area isZero]) not invalidate
"invalidate all widgets in my window that have real area to them"
(self query up withRoot) do: [:each | each flash]
"flash me and all my parents"
(self query top hasProperty: #UpdateGroup) do: #update
"send #update to all elements in my window that are marked with the #UpdateGroup property"
Tuesday, May 29, 2012
Stepping Out of the Balloon
Many many years ago, I returned from an LDS Mission in the lovely country of Norway. It was Christmas of 1991. I took a job at what was then Siemens Nuclear Power. In the months that followed, I was introduced to this novel computer programming language called Smalltalk. I took to it, and I like to think it took to me. For the last twenty years, I've done quite a few things with it. From writing nuclear fuel design and assembly software (which are still running today) to making sure that the french fries, green beans, and much of the rest of the world's food is a little cleaner and better. From numeric modeling to implementing frivolous things like roman numeral message selectors and goto. And quite a bit of toolsmithing. To say it's "served me well" is an understatement.
The wonderful world of Smalltalk technology and philosophy, wouldn't have been as enriching for me, if not for the wonderful community of people I have rubbed shoulders with over the years. I remember my first post via bitdearn to comp.lang.smalltalk back in 1994. Meeting people at conferences such as OOPSLA, ESUG, STIC, and others. I have made a ton of friends and come to admire the work and enthusiasm of so many people.
Working at Cincom, the "original" commercial Smalltalk vendor, has always been a sort of pinnacle in my Smalltalk pilgrimage. A chance to be at a hub of where Smalltalk was happening at.
But all journeys must come to an end. And the time for this journey, for me, for now, has come to an end. On June 4th, I will begin work at Nelson Irrigation, doing embedded automation work, sprinkled (that's a pun) with a variety of end user application work. I am super excited. It's a neat project, a neat company, and an indescribably neat culture.
But it means I'll be dropping out of that central involvement in the Smalltalk community. I may still do some Smalltalking for sure, and the ethos that is Smalltalk will permeate all the work I do, but it's unlikely I'll show up at a Smalltalk conference in the near future or be active in the mailing lists as a heavy contributor.
And so it's in some ways, a probable good bye for me. And that makes me sad. And yet happy, because it's better to feel sad about what I'm losing with the community, than thrilled to be shot of it all.
I also want to point out something my departure from Cincom does NOT mean. There have been some others prominent names that have left Cincom recently, and one might assume there was a sinking ship meme going on. Such is simply not the case with me. The timing of this opportunity to learn and be involved in some new and different things, was out of my hands. When it surfaced, unfortunate timing aside, I felt I could not pass the opportunity up. So please don't read any sort of ill boded fate for Cincom or VisualWorks into my departure. I have faith in the people that remain, and in the people that will replace me, I'm sure they'll take the balloon farther and better heights than I was capable of. Any age or oddities aside, it remains some of the best tech that is out there.
As for this blog, I'm not sure what will happen. The purpose of this blog was always meant to be about Smalltalk, and in particular the live "biological" nature of the Smalltalk program philosophy. There are one or two things of the normal ilk that I'd like to write about based on some work I've been doing of late, and then, it'll likely take a hiatus, possibly permanent.
If our paths don't cross in the future, in the immortal (and skewed) words of Spock, may you "Learn Long and Prosper." And remember, "Dead men never wish they'd spent more time at the office."
And so it's in some ways, a probable good bye for me. And that makes me sad. And yet happy, because it's better to feel sad about what I'm losing with the community, than thrilled to be shot of it all.
I also want to point out something my departure from Cincom does NOT mean. There have been some others prominent names that have left Cincom recently, and one might assume there was a sinking ship meme going on. Such is simply not the case with me. The timing of this opportunity to learn and be involved in some new and different things, was out of my hands. When it surfaced, unfortunate timing aside, I felt I could not pass the opportunity up. So please don't read any sort of ill boded fate for Cincom or VisualWorks into my departure. I have faith in the people that remain, and in the people that will replace me, I'm sure they'll take the balloon farther and better heights than I was capable of. Any age or oddities aside, it remains some of the best tech that is out there.
As for this blog, I'm not sure what will happen. The purpose of this blog was always meant to be about Smalltalk, and in particular the live "biological" nature of the Smalltalk program philosophy. There are one or two things of the normal ilk that I'd like to write about based on some work I've been doing of late, and then, it'll likely take a hiatus, possibly permanent.
If our paths don't cross in the future, in the immortal (and skewed) words of Spock, may you "Learn Long and Prosper." And remember, "Dead men never wish they'd spent more time at the office."
Tuesday, May 15, 2012
Caching and The Way of the Object
Lately, we've had an internal debate about how to make some places where we do sorting go faster. Of course, there's always the caveat: Make it Work, Make it Right, and if you need to, Make it Fast. Learning when you need to care about that last step, is always a sort of art based on experience. Often the answer is simply "all other things considered, it's fast enough, I've got bigger problems to solve elsewhere."
Let's take an example though. Take a collection of Class objects (we'll leave the MetaClasses out), and sort them by their response to toolListDisplayString:
In our example, we're using VisualWorks' ability to substitute simple Symbols as replacements for BlockClosures that send a single unary message to their arguments. It is equivalent in behavior to the more traditional:
Sending ascending to a BlockClosure before using it was first developed back in this post and this following post. And that was then integrated into VisualWorks 7.8 (or was it 7.8.1?).
The problem with our example, is that the method toolListDisplayString is not cheap. It's more than just concatenating strings for class and namespace names together. It looks at how much context needs to be added to the class name by itself to make it unique. Or put another way, since there are multiple classes in the system with the name Text, it determines it must add some info about the containing namespace, while the class PostgreSQLEXDIBLOBManipulationOutsideTransaction probably only has one instance and doesn't need any namespace info to contextualize it.
The core default sort algorithm in VisualWorks is hybridization of quicksort and a insertion sort. The implications for this, is that this somewhat expensive toolListDisplayString method may be called repeatedly for some objects. That means redundant CPU cycles.
A common solution to this kind of problem is memoization. Memoization basically is a fancy word which means "cache the results of your computation function, so you only evaluate the function once for each unique input and just look up the cached result for subsequent calls."
How to go about doing memoization around sorting sites can be accomplished a number of different ways.
In Place
The first and simplest way is to simply do it right at the sort site. We could rewrite our example to read:
This is the simplest thing that could possibly work. That's its single advantage. The disadvantages is that adds a bit code bloat for every place we decide this is worth doing. It intermingles with what was otherwise pretty simple and easy to read. To flip back and forth between memoized and non-memoized is a pain. And it gets done again and again and again at each call site, so there's no real reuse involved. The risk of implementing it wrong is retaken at each implementation.
The desire to be able to easily flip back and forth between memoizing and not, shouldn't be underrated. Memoization is not free. It costs cycles. It is usually trial and error under conditions that the programmer knows to be common for his code, that determine if the overhead of memoizing is less than the cost of the original redundant function.
This technique is best for those that like to write more code. If you like to brag about how much code you've written, how many lines, classes, or methods, this might be for you. It's simple, and you can demonstrate your superior typing speeds.
More Sort Methods
Another approach is to add a new sort method. VisualWorks already has sort, sort:, sorted, sorted:, sortWith:, and probably some I've missed. Application developers tend to add one or two of their own. A common one in the past has been sortBy: which supports using a single arg block. So you figure out how many of these APIs you want to replicate as memoized alternatives and implement them, for example: memoizedSortBy:, etc. This is if you're a good citizen. If you're not so kind, you use something that looks like just another general purpose sorting API (e.g. sorting: aOneArgBlock).
Implementing memoizedSortBy: gives you the advantage of optimizing things a little differently. You can choose to build a parallel vector of objects collect:ing for the function, retaining index information, sort those, and then basically apply those indices to the original input set. Or you can just go with the Dictionary and at:ifAbsent: approach.
Now the only change we need to make to our call site is to change it to:
You'll note that we don't have ascending anymore in there. The SortFunctions stuff is basically incompatible with this approach. Since this API wants to work with single arg blocks, which it's memoizing the results for, it has hard coded the sort direction inside of it.
I consider this the C Programmer's (or procedural) Approach. If at first you don't find a function, try, try, another one. That it is in this simplistic form incompatible with the SortFunctions thing, is personally aggrieving to me (we lose the elegance of setting the direction, as well as chaining functions, or deriving our own rocketship sorts). Another disappointment is that it's one more API I have to figure out which one I should use. I see a family of sort methods, and I've got to figure out (or recall) what the different nuances of each are (this one takes one arg, this one takes none, this one takes two, each has different trade offs, etc).
Finally, it limits the technique of memoization to sorting. What if I want to use memoization for collect:ing over a collection that I know has redundant elements. In that case, I have to go back to the In Place approach.
The Way of the Object
I'd rather take a page from the SortFunction technique. BlockClosures (or more generally, objects which respond to the message value: and fill the roles of functions) are real Objects too. And I'm declaring that they too have a right to be included in the General Love Fest of Polymorphism. The idea here, is that we add a new memoizing method to BlockClosure (and Symbol too so they can continue to stand double as simple BlockClosures). Sending memoizing to a BlockClosure returns a MemoizedFunction object which can do value: just like a BlockClosure. But it keeps a memory of evaluations and uses those when found. My first cut implementation is published as TAG-MemoizedFunctions in the Open Repository.
Now our example just turns in to:
For this simplistic example, slapping memoizing in there is a 10x speed boost.
What do I like about this approach? First of all, it was fun. This kind of thing, to me, is where the zen of Object Oriented dispatch is at (I don't pretend to be brilliant about this at all, Wilf LaLonde probably wrote an article demonstrating this 20 years ago). I like that it is terse. I like that it localizes the decision about whether to memoize around the function itself rather than the API using it. This is the fastest/easiest way to toggle memoization on and off to profile the differences. I like that I can use it with collect:, or detect:, or allSatisfy:, or any method that makes use of first class function objects. And I like that it only took 10 methods and one class to do. Because Less is More.
Happy Memoizing!
(Why does Apple insist on constantly changing "memoizing" to read "memorizing"? Grumble...)
Let's take an example though. Take a collection of Class objects (we'll leave the MetaClasses out), and sort them by their response to toolListDisplayString:
| classes |
classes := Object withAllSubclasses reject: #isMeta.
classes sorted: #toolListDisplayString ascending
In our example, we're using VisualWorks' ability to substitute simple Symbols as replacements for BlockClosures that send a single unary message to their arguments. It is equivalent in behavior to the more traditional:
| classes |
classes := Object withAllSubclasses reject: [:each | each isMeta].
classes sorted: [:each | each toolListDisplayString] ascending
Sending ascending to a BlockClosure before using it was first developed back in this post and this following post. And that was then integrated into VisualWorks 7.8 (or was it 7.8.1?).
The problem with our example, is that the method toolListDisplayString is not cheap. It's more than just concatenating strings for class and namespace names together. It looks at how much context needs to be added to the class name by itself to make it unique. Or put another way, since there are multiple classes in the system with the name Text, it determines it must add some info about the containing namespace, while the class PostgreSQLEXDIBLOBManipulationOutsideTransaction probably only has one instance and doesn't need any namespace info to contextualize it.
The core default sort algorithm in VisualWorks is hybridization of quicksort and a insertion sort. The implications for this, is that this somewhat expensive toolListDisplayString method may be called repeatedly for some objects. That means redundant CPU cycles.
A common solution to this kind of problem is memoization. Memoization basically is a fancy word which means "cache the results of your computation function, so you only evaluate the function once for each unique input and just look up the cached result for subsequent calls."
How to go about doing memoization around sorting sites can be accomplished a number of different ways.
In Place
The first and simplest way is to simply do it right at the sort site. We could rewrite our example to read:
| classes |
memory := IdentityDictionary new.
classes := Core.Object withAllSubclasses reject: [:each | each isMeta].
classes sorted: [:each | memory at: each ifAbsentPut: [each toolListDisplayString]] ascending
This is the simplest thing that could possibly work. That's its single advantage. The disadvantages is that adds a bit code bloat for every place we decide this is worth doing. It intermingles with what was otherwise pretty simple and easy to read. To flip back and forth between memoized and non-memoized is a pain. And it gets done again and again and again at each call site, so there's no real reuse involved. The risk of implementing it wrong is retaken at each implementation.
The desire to be able to easily flip back and forth between memoizing and not, shouldn't be underrated. Memoization is not free. It costs cycles. It is usually trial and error under conditions that the programmer knows to be common for his code, that determine if the overhead of memoizing is less than the cost of the original redundant function.
This technique is best for those that like to write more code. If you like to brag about how much code you've written, how many lines, classes, or methods, this might be for you. It's simple, and you can demonstrate your superior typing speeds.
More Sort Methods
Another approach is to add a new sort method. VisualWorks already has sort, sort:, sorted, sorted:, sortWith:, and probably some I've missed. Application developers tend to add one or two of their own. A common one in the past has been sortBy: which supports using a single arg block. So you figure out how many of these APIs you want to replicate as memoized alternatives and implement them, for example: memoizedSortBy:, etc. This is if you're a good citizen. If you're not so kind, you use something that looks like just another general purpose sorting API (e.g. sorting: aOneArgBlock).
Implementing memoizedSortBy: gives you the advantage of optimizing things a little differently. You can choose to build a parallel vector of objects collect:ing for the function, retaining index information, sort those, and then basically apply those indices to the original input set. Or you can just go with the Dictionary and at:ifAbsent: approach.
Now the only change we need to make to our call site is to change it to:
| classes |
memory := IdentityDictionary new.
classes := Core.Object withAllSubclasses reject: [:each | each isMeta].
classes memoizedSortBy: [:each | each toolListDisplayString]
You'll note that we don't have ascending anymore in there. The SortFunctions stuff is basically incompatible with this approach. Since this API wants to work with single arg blocks, which it's memoizing the results for, it has hard coded the sort direction inside of it.
I consider this the C Programmer's (or procedural) Approach. If at first you don't find a function, try, try, another one. That it is in this simplistic form incompatible with the SortFunctions thing, is personally aggrieving to me (we lose the elegance of setting the direction, as well as chaining functions, or deriving our own rocketship sorts). Another disappointment is that it's one more API I have to figure out which one I should use. I see a family of sort methods, and I've got to figure out (or recall) what the different nuances of each are (this one takes one arg, this one takes none, this one takes two, each has different trade offs, etc).
Finally, it limits the technique of memoization to sorting. What if I want to use memoization for collect:ing over a collection that I know has redundant elements. In that case, I have to go back to the In Place approach.
The Way of the Object
I'd rather take a page from the SortFunction technique. BlockClosures (or more generally, objects which respond to the message value: and fill the roles of functions) are real Objects too. And I'm declaring that they too have a right to be included in the General Love Fest of Polymorphism. The idea here, is that we add a new memoizing method to BlockClosure (and Symbol too so they can continue to stand double as simple BlockClosures). Sending memoizing to a BlockClosure returns a MemoizedFunction object which can do value: just like a BlockClosure. But it keeps a memory of evaluations and uses those when found. My first cut implementation is published as TAG-MemoizedFunctions in the Open Repository.
Now our example just turns in to:
| classes |
classes := Object withAllSubclasses reject: #isMeta.
classes sorted: #toolListDisplayString memoizing ascending
For this simplistic example, slapping memoizing in there is a 10x speed boost.
What do I like about this approach? First of all, it was fun. This kind of thing, to me, is where the zen of Object Oriented dispatch is at (I don't pretend to be brilliant about this at all, Wilf LaLonde probably wrote an article demonstrating this 20 years ago). I like that it is terse. I like that it localizes the decision about whether to memoize around the function itself rather than the API using it. This is the fastest/easiest way to toggle memoization on and off to profile the differences. I like that I can use it with collect:, or detect:, or allSatisfy:, or any method that makes use of first class function objects. And I like that it only took 10 methods and one class to do. Because Less is More.
Happy Memoizing!
(Why does Apple insist on constantly changing "memoizing" to read "memorizing"? Grumble...)
Subscribe to:
Posts (Atom)