Friday, September 23, 2011

One of the Best Bits of Programming Advice I ever Got

Years ago (early 1992), I attached myself to this crazy skunkworks project that was using this weird language called Smalltalk. "Object Oriented" was in its infancy as a "hot" item. High paid consultants. Lots of people laying claim to what this new object religion was all about. This was 5 years before Alan Kay would make the statement "I invented the term 'Object Oriented Programming' and this {Java and C++} is not what I had in mind."

Shortly after hooking up with this whacky group with the whacky language, still confused about what the difference was between an instance variable, a class variable, and a class instance variable, I found myself in a training course taught by Russ Pencin, of ParcPlace. Russ would say something that I didn't really appreciate at the time. Despite not understanding the point behind this sage advice, I endeavored to follow it. It would take years of experience and exposure to appreciate it's value. The advice?

Don't make objects that end with 'er'.

That's it. The OOP paradigm sprang to life amidst of a culture of what we called "procedural programming." Now days we don't talk so much about the comparison between the two paradigms. Probably in part because Object Oriented languages are now a dime a dozen. The OOP religion, in a multitude of flavors won out. Sadly, I often find myself echoing words I heard Adele Goldberg say around 2000: "Now days we have lots of Object Oriented Programming, but not so many Object Oriented Programmers". If there was one piece of advice I would pass on to the hordes of would be Object Oriented Programmers, it would be the sage advice offered by Russ: "Don't make objects that end with 'er'."

What's in a name anyway? Why is this worth getting excited about? What I've discovered over the years, is that the jst of OOP is that we bind behavior to data. As long as you haven't joined in the Functional Monks in their Monasteries of Statelessness, programs are made of behavior and data. In classic structured/programming, we concentrate on behavior (verbs), and then figure out what data (nouns) we need to make it all work. In other words, we bind data to behavior. But in OOP, we make the locus of programs be the nouns, the data, and then we figure out what kind of behavior we can bind to them, and hope that the problems we hope to solve gel out of the emergent behaviors.

I recently posited to a colleague that in nearly every "er" object case, there was a better name for it. And that giving it a better name would tend to make the design more encapsulated, less spaghetti code, in short more object oriented. It's not a hard and fast rule, but there are a lot of cases where it can improve things.

Take some sort of "Loader" for example. The focus here is on the unit of work it does. It'll have lots of instance variables, lots of arguments, and pass lots of data around probably. Now instead replace that with a LoadRecord and a LoadStream. I'm reasonably confident you'll end up with something that is more akin to what the original Founding Fathers of OOP had in mind. We want to create objects that describe what they are, and then bind behavior to them, rather than focus on what they do, and then figure out what data they'll need to do that.

Some er's that I've learned to avoid over the years:

  • Managers - Every time I see one of these, I cringe. People will usually tell me what it does, long before they can tell me what it is. Is it a registry? Fine call it a registry. Is it a history or a log? Call it that. Is it a factory? Call it that.
  • Controllers - Only good controller object I've made in the last 20 years was an interface to a BallastVoltageController that represented a real world object. The fact that every single MVC implementation in the world has had a different role for Controller ought to tell us something about how well that idea fit.
  • Organizer (and many like them) - Focus is on what it does. This is a great example of how easy it is to turn many of these 'ers' into nouns. Call it an Organization. Now we're focusing on what it is.
  • Analyzer/Renderer/etc - Definitely examples of "worker" objects. What if they had been Analysis/Rendering/etc.
  • Builder/Loader/Reader/Writer/etc - Remove the focus from the objects being manipulated, and tend assume to much responsibility themselves.
There's lots of exceptions to such a rule of course.
  • There are lots of noun words that end in 'er'. Register. Border. Character. Number. If it's really a noun, fine.
  • There are many 'er' words that despite their focus on what they do, have become so commonplace, that we're best to just stick with them, at least in part. Parser. Compiler. Browser.
  • When you are trying to model a domain object that ends in 'er'. I'm fine with a Manager subclass of Personel, which is there to refine a type of personal that has management behavior to it.
Your mileage may vary, I'm sure there are those that disagree with this. Until you apply the mindset for a while though, you'll never really know. Give it a whirl on one of your projects/designs and see what happens.

Wednesday, September 21, 2011

Don't Try This At Home: Stealing from the stack

I think there are days, when I want to do things I know I shouldn't as a programmer. Do others experience this. Some have said that Smalltalk is like a gun, that "with great power comes great responsibility." Some times, some of the tricks tempt me, and if I know no one's looking (read: I'm not going to be putting this in any production code), I find myself looking around for opportunities to flex a little bit of language super power muscle. Just for the grins. Just because I can.

Messing with thisContext and the stack is one of those things. When I was implementing the _1:_2:_3:_4:_5: message I was talking about the other day.

The proper and boring way to implement it was like this:
_1: one _2: two _3: three _4: four _5: five

| arguments |
arguments := Array new: 5.
arguments at: 1 put: one.
arguments at: 2 put: two.
arguments at: 3 put: three.
arguments at: 4 put: four.
arguments at: 5 put: five.
^(StringParameterSubstitution default)
originalString: self;
args: arguments;

But, I didn't want to do that. That was too much typing I think. I wanted to be clever, so I did this instead:

_1: one _2: two _3: three _4: four _5: five

^(StringParameterSubstitution default)
originalString: self;
args: (thisContext stack copyFrom: 1 to: 5);

There are no references to any of the method arguments. Knowing that the stack is already an array with the arguments already placed in them, exactly what I want, I just grab that, instead of making my own array populated with the method arguments.

Don't do this in production code. It's tricky and evil. But sometimes, it's good to remind yourself, or learn from others, what this great environment really is capable of doing. Who knows, having one be aware of it, there may come a point where playing with thisContext or the stack, may help you solve a real problem, in production code, or not.

A thought about field identifiers

When I was playing with template field substitution last couple of days, I was again reminded that the common Smalltalk substitution syntax (John Brant informs me that it is indeed in more than just VisualWorks and Squeak) is really frustrating to use for any sort of HTML/XML generation. The use of the < and > means you have to constantly escape those characters in your templates if you are generating any kind of output that you actually want to include the alligator brackets.

And a thought occurred to me. As long as I'm not generating HTML or XML, I couldn't think of a better field identifying character to use. There are some others such as { } or [ ] that work visually as well. But for the last 15+ years, we've all been reading more and more and more of the "when Lisp met alligators" syntax. It's been pounded mercilessly into our brains. We've all gotten quite accustomed to parsing, in our heads, the text shows up between brackets, apart from the rest. Because of that, using them as field identifiers, is actually the best thing possible.

It's the meta that's the achilles heel. When you want to use these same field separators to generate other field separators. In other words, I could posit that if HTML/XML had used { } to enclose tags, we'd find ourselves appreciating it in a substitution syntax the most, and hating it most when we tried to use it to generate more of the same.

Tuesday, September 20, 2011

A Tragedy: When Localization met Interpolation (repost)

This is a repost from my old blog. When I end-of-lifed that Blog, I said might pull some of those over here. Lukas's comments on the previous Syntactic Tartness for Macro Expansion reminded me of this one. At the time I wrote that, I failed to give credit to Steve Dahl, who I had spent about 2 days kicking ideas back and forth with on Skype about this.

Adrian Kuhn commented in a previous post:

"In general, I think that Smalltalk desperately misses String interpolation!"

Yes, I agree. But the devils running amuck in the details. The problem is that I18n translation made it first. There's two ways that I've seen to do translation: 1) statically recompile your program, making a sweep of all (or subset) of the strings in the program and translating them 2) doing translation at runtime, by looking up the string in some database. The first is pretty old school. :)

The challenges with String Interpolation are a few:

Far Reaching Changes

You need to modify the compiler. You might be able to try to have a unary message that evaluates a string, compiling expressions on the fly in the context of the sender. In this case, the tools can do nothing to help you get the code right. They get confused, because you create variables that don't appear to be referenced. You can't do it right in the context of clean closures. So a parser/compiler change is definitely in order. And don't forget the usual rant, it's not enough to modify just the base compiler. You've got to look at it in the context of the debugger's ability to put breakpoints as well as step through code, in the context of the RB parser, which you want for rewrites and formatting, as well as code highlighting maybe. This does not make it undoable. It simply means it's not a quick thing you can whip up over the weekend.

What Is It When?

So lets say you have the expression:
string := '[[customer name]] is [[customer age]] years old'.

If you use it as the argument of a nextPutAll: send, you probably want it to be in expanded form. But, if you want to look it up in an I18n dictionary, then you would rather have it in symbolic form.
You could detect string literals with whatever the expression preamble is ([[ in the examples above) at compile time and build some sort of complex literal that held both blocks as well as a string. But you'd still have to figure out how to resolve the message =. In an assert: form, you'd probably want expanded, but at dictionary look up time for translation you'd want the compressed form.

What Happened Elsewhere

It's been interesting to spend a little time looking at how Ruby and Python do these. This page makes it clear that the Ruby gettext guys also understood that it is hard to have your cake and eat it too when it comes to mixing interpolation and localization. The answer isn't as definitive with Python, but I think I've convinced myself it's a similar story after looking at docs for a little while.

Monday, September 19, 2011

Syntactic Tartness for Macro Expansion

One thing that is very natural for Smalltalk image based programming, is to programmatically assemble source and install it into the very same running program. I've been using the RB Change framework to do just that with something I'm working on lately.

To piece together the appropriate source, working with a template source and string, and then fill in the variables is something that's desirable. VisualWorks and Squeak (maybe some other Smalltalk use this same approach?) have an ability to expand templates strings with macro substitution. The template for a setter method might look something like

'<1s>: anObject
<1s> := anObject'

To fill out those fields, you send messages like expandMacrosWith:, expandMacrosWith:with:, expandMacrosWith:with:with:, and expandMacrosWithArguments:.

I am no fan of this API. First, it is too verbose. When I'm looking at template substitution, I don't want a bunch of other longish selectors. They dilute the information I'm trying to glean as I piece together the template and what's being substituted.

Secondly, I find it doesn't scale well when evolving the code. It's common that I start with a simple template in the first cut of code. Something like
    'Hello <1s>' expandMacrosWith: aName

But as I refactor and discover more needs, the need to add parameters arises. As long as I only add two more parameters, I can just use the variants with the additional with: keywords.
'<1s> ^self <2s> <3s>'
expandMacrosWith: aVariableName
with: aBasicAccessingMethod
with: sizeof + 1

As soon, as I go to 4 fields though, I have to change gears and use the expandMacrosWithArguments: API and build the sequence myself. It could be argued that it's best to just always start with this version, but it's the very longest of the selectors. If you're using VisualWorks, and don't have the language syntax for array construction (e.g. {statement. statement. statement.}), then it's even funner, because you can use the Array with:with:with:with: expression, but if you need to move to 5 fields, then you've got to change your code all around again.

Over the years, I've tried a couple of different experiments to make this all something I liked a little better. They've used involved interesting binary selectors (e.g. "%") and proxy objects, or at least fun with doesNotUnderstand: messages. I thought I'd try something a little different. I wanted something that correlated well with the numbered fields, but was uber-terse as well.

So I went with the shortest selector that could possibly work: _1:, _1:_2:, _1:_2:_3:, etc. Written in selector shorthand like that, it's pretty ugly. When actually used in code though, it improves some:
'<1s>At: anOffset
^self <2s> anOffset * <3p> + <4p>'
_1: aVariableName
_2: aBasicAccessingMethod
_3: aByteSize
_4: sizeof + 1

^Array with: (Tools.Trippy.DerivedAttribute
label: ''<2s>''
valueBlock: [self <2s>])'
_1: upperVariableName
_2: aVariableName

^(0 to: <2p>) collect: [:n | self <3s> <4p> + (n * <5p>)]'
_1: aVariableName
_2: anInteger - 1
_3: aBasicAccessingMethod
_4: sizeof + 1
_5: aByteSize

I don't dare call this syntactic sugar. The use of the underscores is too ugly to be sugary. It's a sort of bitter sweet thing, thus the "tart" label.

It does solve two problems nicely. It is very terse. You see a minimal amount of "scaffolding" getting the job done, and are free to spend more time looking at the template and the substitutions. One could get that though by using inlined arrays with a shorter selector, something like:
^(0 to: <2p>) collect: [:n | self <3s> <4p> + (n * <5p>)]' macro: {
anInteger - 1.
sizeof + 1.

One thing you lose with this though, is the strong association between each substitution and field. In the former example, when I glance at the template and see field 4, and then want to know what's being substituted, I see it instantly. I just find the 4 in the selector below. Without that direct link, I have to parse the array linearly to find it.

I haven't used this _syntax enough to decide if it's usefulness would overcome its ugliness, but it was interesting to play with it and discover the visual readability aspect. I'll likely use it on some more non-production stuff to get a better feel.