"if" Considered Harmful in OO programming

Note: I'm not saying "don't use if." This node is all about avoiding if to check the type of data you have in OO code, even though there are times that it's unavoidable.

if is a fascinating keyword. Essentially, it says "I don't know what I have, so I'm going to guess." This guessing leads to subtle bugs. Do you see any potential problem in the following?

if ($x and ! $y) {
    ...
}
elsif (!$x and $y) {
    ...
}
else {
    ...
}
[download]

The above code might not have a bug, but it's interesting to note that it does not explicitly deal with the case of both variables being true or both variables being false. Both cases will receive the same treatment. This can raise a maintenance problem if another programmer has to distinguish between these two cases and isn't aware of this snippet.

Most programmers generally do not sit down and draw out truth tables every time they want to code a conditional, but I hardly feel this is a problem. Unexpected cases come up all the time in programming and if is hardly the only culprit. However, when dealing with object oriented code, "if" statements are frequently a maintenance nightmare that's symptomatic of bad OO design.

A classic example lies in older versions of HTML::TokeParser::Simple (I wrote it, so I can pick on it.) In this module, I needed to maintain "drop-in replacement" compatability with my base class, HTML::TokeParser. To do this, I blessed the tokens that the latter module returned into a new token class. Then I had a problem. The array references returned from get_tag() and get_token() methods in HTML::TokeParser had subtly different structures. My methods should not care which method generated the token, so I wrote the following ugly hack:

  1 sub _synch_arrays {
  2     # if the method is called from a token generated by the get_ta
+g() method,
  3     # the returned array reference will be identical to a start or
+ end tag
  4     # token returned by get_token() *except* the first element in 
+the reference
  5     # will not be an identifier like 'S' or 'E'
  6 
  7     my $array_ref = shift;
  8     my $tag_func  = GET_TOKEN;
  9 
 10     unless ( grep { $array_ref->[0] eq $_ } keys %token ) {
 11         # created with get_tag() method, so we need
 12         # to munge the array to match the get_token() array
 13         # After this is called, and before the method returns, you
+ must
 14         # use something like the following:
 15         # shift @$self if $method == GET_TAG;
 16         $tag_func = GET_TAG;
 17         if ( '/' ne substr $array_ref->[0], 0, 1 ) {
 18             unshift @$array_ref, 'S';
 19         }
 20         else {
 21             unshift @$array_ref, 'E';
 22         }
 23     }
 24     return ( $array_ref, $tag_func );
 25 }
[download]

This helper function would synchronize the returned array references to ensure that my methods could operate on one data structure rather than two. Unfortunately, I always had to remember to "unsynchronize" them if the calling function was get_tag(). If I didn't, users who tried to use my code as a drop-in replacement for the parent class would find the get_tag() method returning an array ref with an extra element at the front.

Line 10, a disguised if, is the first indicator that I have a major design flaw. This line checks to see if the first array element exists in my global %tokens hash. There are two problems here. The global variable isn't really that bad because it's effectively a private, read-only class variable (but it should have been uppercased). I should have wrapped that in an accessor, but I didn't. The major problem lies in the fact that I'm trying to determine what type of token I have. Lines 18 and 21 are where I unshift the type of token onto the front of the array. This made the rest of my code simpler, but synchronizing the arrays meant that I sometimes had to unsynchronize them and this made my code very difficult to maintain. I was constantly breaking it while adding new features.

A perfect example is this ugly method:

  1 sub as_is {
  2    my ( $self, $method ) = _synch_arrays( shift );
  3    my $type = $self->[0];
  4    my $text = $self->[ $token{ $type }{ text } ];
  5    shift @$self if $method == GET_TAG;
  6    return $text;
  7 }
[download]

The reason this works is because the array synchronization ensures that the "text" attribute is now always at the same index for a given type (line 4), but more than once I forgot to add line 5 which removed the extra element.

The latest version of this module no longer has this problem. Different token types? They're now subclasses of HTML::TokeParser::Simple::Token and I only test the type to bless the token into its appropriate class. Most of the new classes are very small and only override a tiny but critical portion of the base token class functionality. The data that I used to capture while synchronizing arrays is now captured when the token is instantiated and this has made the code much easier to work with. For example, the as_is() method is in my token subclass:

sub as_is { shift->[-1] }

However, "text" tokens have the raw text in a different spot, but the corresponding subclass overrides that method like this:

sub as_is { shift->[1] }

No longer do I need to synchronize arrays or do any fancy tricks. By allowing polymorphism to handle the logic for me, the code is much cleaner. Further, if you need special handling of particular HTML types, you can simply subclass the appropriate token class.

Interestingly, after I uploaded the latest version of this module to the CPAN, I grepped for if statements in the code and I saw lines like this:

lib/HTML/TokeParser/Simple/Token/Tag.pm: return [] if END_TAG eq $INSTANCE{$self}{type};

Instantly I realized the flaw. I only have one "tag" type. If a programmer subclasses this, she's going to override the behavior of both start tags and end tags. She'll need to test which type it is to prevent bugs and this is the sort of maintenance issue I wanted to avoid. Just seeing this keyword in my code instantly pointed out a design flaw. Now I just need to find the time to fix this :)

Cheers,
Ovid

New address of my CGI Course.

Comment on "if" Considered Harmful in OO programming Select or Download Code

Replies are listed 'Best First'.
Re: "if" Considered Harmful in OO programming by jryan (Vicar) on Sep 20, 2004 at 06:01 UTC
Just a nit: `unless (grep { $array_ref->[0] eq $_ } keys %token) {` [download] is better as: `unless (exists $token{$array_ref->[0]}) {` [download]	[reply] [d/l] [select]
Re^2: "if" Considered Harmful in OO programming by Ovid (Cardinal) on Sep 20, 2004 at 06:35 UTC
That's far more than a nit. I'm not sure what I originally intended there but that's some bad code on my part. Maybe I was lower-casing the tag or there was an assignment to an array; I don't know. In any event, we have a curious example of code evolution because my test suite validated it. That's some interesting food for thought. Cheers, Ovid New address of my CGI Course.	[reply]
Re: "if" Considered Harmful in OO programming by dragonchild (Archbishop) on Sep 20, 2004 at 03:53 UTC
This smells very similar to the issue of writing C-style code in Perl. (Or Java, COBOL, Fortran, etc). You can always tell the language someone is more comfortable in when they're writing their first Perl project. `for (my $i=0;$i<=$#array;$i++){ &foo($array[$i]) }` anyone? Imho, explicitly testing should almost always at the boundaries of any system. When I parse the XML template for PDF::Template, I have a lot of if-statements that treat the class structure as an open book. However, once I've got the data structure translated into a tree of objects, I can traverse that tree able to depend that each class will DWIM. This is because I've defined exactly what the hooks each node can use should be in the rendering process. The actual processing engine (Controller) should always be able to depend on the tree of objects (Model) to do the right thing. Of course, I cheat and use a $context object to maintain state and make intelligent decisions. (View? It's more of the stack/heap, now that I think about it.) But, yeah, I know what you mean about the small classes. Many of the classes in P::T are well under a hundred lines, including comments. Excel::Template averages under 30 per node class, for this very reason. ------ We are the carpenters and bricklayers of the Information Age. Then there are Damian modules.... sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon.* - flyingmoose I shouldn't have to say this, but any code, unless otherwise stated, is untested	[reply] [d/l]
Re: "if" Considered Harmful in OO programming by tilly (Archbishop) on Sep 20, 2004 at 15:57 UTC
Simplifying your point, you should not declare that A "isa" B unless A really is a B. Barbara Liskov would agree. If you find yourself using OOinheritance for polymorphism, and then having to insert if checks to correct that polymorphism, you have a problem. Clearly the polymorphism is not really doing what you want it to do, and as your object hierarchy spreads out, it will need ever more correction. A good general rule is presented in Code Complete 2 (p 149, section 6.3): This section has presented numerous rules for staying out of trouble with inheritance. The underlying message of all these rules is that inheritance tends to work against the primary technical imperative you have as a programmer, which is to manage complexity. For the sake of controlling complexity, you should maintain a heavy bias against inheritance. (Emphasis in original.) Therefore only inherit when it obviously is right to so, and view with extreme doubt anything that increases the complexity of maintaining that inheritance. If it isn't perfect, don't do it. Update: Modified per chromatic's note	[reply]
Re^2: "if" Considered Harmful in OO programming by chromatic (Archbishop) on Sep 20, 2004 at 18:10 UTC
I think you might better write "if you find yourself using inheritance for polymorphism".	[reply]
Re: "if" Considered Harmful in OO programming by talexb (Chancellor) on Sep 20, 2004 at 19:28 UTC
Interesting post. Any time I see something like `if ( $x and !$y ) { .. # true, false } elsif ( !$x and $y ) { .. # false, true } else { .. # true, true or false, false }` [download] warning bells go off because I wonder if it could have been written as `if ( $x ) { if ( $y ) { .. # true, true } else { .. # true, false } } else { if ( $y ) { .. # false, true } else { .. # false, false } }` [download] instead. Granted, this may not be required much of the time, but it sure makes code maintenance a year later a little easier. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]
Re^2: "if" Considered Harmful in OO programming by ikegami (Patriarch) on Sep 21, 2004 at 23:31 UTC
I disagree. I don't think duplicating code is good for maintenance at all. What if you have 3 values instead of 2. Your if would be three levels deep, and have say 5 identical pieces of code. Let's hope you don't forget to change all 5 if a changes is needed. What you suggest is not scaleable at all. `else` means all other cases. If you have 2 values in a flat if, `else` covers the differences of 4 cases. If you have 3 variables in a flat if, `else` covers the differences of 9 cases. etc. It's not that hard to process, especially compared to what you recommend.	[reply] [d/l] [select]
Hmm... by SpanishInquisition (Pilgrim) on Sep 21, 2004 at 16:58 UTC
Deeply chained ifs are ugly in ANY coding style. One point may OOP advocates miss is that inheritance and polymorphism are often just drawing out a simple if or for loop or whatever and obscuring it exists. Often, this makes it worse and designs can be 5x longer in LOC and actually less readable (ever heard the phrase "Java is readable -- like a phonebook?"). What we have is a simple coding error that could have been tamed in other styles as well, such as adding a function to check for tags of certain types. In fact, the thought to make that a function would have been more readily obvious in a functional paradigm, at least I think so (i.e. short functions that build on other functions so everything works like magic). Liberal use of objects can be a good thing, but using polymorphism or inheritance to defeat an if statement is suicide. Beer is good in small amounts. Objects are good in small amounts. Know your limitations. <quote> This section has presented numerous rules for staying out of trouble with inheritance. The underlying message of all these rules is that inheritance tends to work against the primary technical imperative you have as a programmer, which is to manage complexity. For the sake of controlling complexity, you should maintain a heavy bias against inheritance. </quote> 100% Agreed! Thank you people for backing me up! I'd print this out and put it on my cube wall, though everyone here at Innotech would stone me for those beliefs. I'm already a black sheep though... (baaa!)	[reply]
Re: Hmm... by bwelch (Curate) on Sep 21, 2004 at 17:25 UTC
Well said! Objects are good when they help. When I was learning java and OOP, it was fun. However, it was soon easy to use OOP features where they really weren't helpful just to 'use the advantages of OOP', leading to code that wasn't good. All things in moderation...	[reply]
Re^2: Hmm... by pdcawley (Hermit) on Sep 26, 2004 at 21:46 UTC
You're not railing against the OO way of doing it, you're railing against the idiotic Java way of doing it. Take a look at Smalltalk, where even the simplest 'if' statement is implemented using a message polymorphism on the Boolean class for examples of this sort of thing Done Right. Of course, you can only really do it right if you're in a language which has blocks/anonymous subroutines, which is where Java falls down.	[reply]
Re^2: Hmm... by radiantmatrix (Parson) on Sep 23, 2004 at 15:40 UTC
All things in moderation... I have a close friend who gave me the best advice. She's a nutritionist, but the idea applies to most situations: Everything in moderation, including moderation. `require General::Disclaimer;` All code, unless otherwise noted, is untested "All it will give you though, are headaches after headaches as it misinterprets your instructions in the most innovative yet useless ways." - Maypole and I - Tales from the Frontier of a Relationship (by Corion)	[reply]
Re^3: Hmm... by SpanishInquisition (Pilgrim) on Sep 23, 2004 at 17:42 UTC