Note: I'm not saying "don't use if." This node is all about avoiding if to check the type of data you have in OO code, even though there are times that it's unavoidable.
if is a fascinating keyword. Essentially, it says "I don't know what I have, so I'm going to guess." This guessing leads to subtle bugs. Do you see any potential problem in the following?
if ($x and ! $y) { ... } elsif (!$x and $y) { ... } else { ... }
The above code might not have a bug, but it's interesting to note that it does not explicitly deal with the case of both variables being true or both variables being false. Both cases will receive the same treatment. This can raise a maintenance problem if another programmer has to distinguish between these two cases and isn't aware of this snippet.
Most programmers generally do not sit down and draw out truth tables every time they want to code a conditional, but I hardly feel this is a problem. Unexpected cases come up all the time in programming and if is hardly the only culprit. However, when dealing with object oriented code, "if" statements are frequently a maintenance nightmare that's symptomatic of bad OO design.
A classic example lies in older versions of HTML::TokeParser::Simple (I wrote it, so I can pick on it.) In this module, I needed to maintain "drop-in replacement" compatability with my base class, HTML::TokeParser. To do this, I blessed the tokens that the latter module returned into a new token class. Then I had a problem. The array references returned from get_tag() and get_token() methods in HTML::TokeParser had subtly different structures. My methods should not care which method generated the token, so I wrote the following ugly hack:
1 sub _synch_arrays { 2 # if the method is called from a token generated by the get_ta +g() method, 3 # the returned array reference will be identical to a start or + end tag 4 # token returned by get_token() *except* the first element in +the reference 5 # will not be an identifier like 'S' or 'E' 6 7 my $array_ref = shift; 8 my $tag_func = GET_TOKEN; 9 10 unless ( grep { $array_ref->[0] eq $_ } keys %token ) { 11 # created with get_tag() method, so we need 12 # to munge the array to match the get_token() array 13 # After this is called, and before the method returns, you + must 14 # use something like the following: 15 # shift @$self if $method == GET_TAG; 16 $tag_func = GET_TAG; 17 if ( '/' ne substr $array_ref->[0], 0, 1 ) { 18 unshift @$array_ref, 'S'; 19 } 20 else { 21 unshift @$array_ref, 'E'; 22 } 23 } 24 return ( $array_ref, $tag_func ); 25 }
This helper function would synchronize the returned array references to ensure that my methods could operate on one data structure rather than two. Unfortunately, I always had to remember to "unsynchronize" them if the calling function was get_tag(). If I didn't, users who tried to use my code as a drop-in replacement for the parent class would find the get_tag() method returning an array ref with an extra element at the front.
Line 10, a disguised if, is the first indicator that I have a major design flaw. This line checks to see if the first array element exists in my global %tokens hash. There are two problems here. The global variable isn't really that bad because it's effectively a private, read-only class variable (but it should have been uppercased). I should have wrapped that in an accessor, but I didn't. The major problem lies in the fact that I'm trying to determine what type of token I have. Lines 18 and 21 are where I unshift the type of token onto the front of the array. This made the rest of my code simpler, but synchronizing the arrays meant that I sometimes had to unsynchronize them and this made my code very difficult to maintain. I was constantly breaking it while adding new features.
A perfect example is this ugly method:
1 sub as_is { 2 my ( $self, $method ) = _synch_arrays( shift ); 3 my $type = $self->[0]; 4 my $text = $self->[ $token{ $type }{ text } ]; 5 shift @$self if $method == GET_TAG; 6 return $text; 7 }
The reason this works is because the array synchronization ensures that the "text" attribute is now always at the same index for a given type (line 4), but more than once I forgot to add line 5 which removed the extra element.
The latest version of this module no longer has this problem. Different token types? They're now subclasses of HTML::TokeParser::Simple::Token and I only test the type to bless the token into its appropriate class. Most of the new classes are very small and only override a tiny but critical portion of the base token class functionality. The data that I used to capture while synchronizing arrays is now captured when the token is instantiated and this has made the code much easier to work with. For example, the as_is() method is in my token subclass:
sub as_is { shift->[-1] }However, "text" tokens have the raw text in a different spot, but the corresponding subclass overrides that method like this:
sub as_is { shift->[1] }No longer do I need to synchronize arrays or do any fancy tricks. By allowing polymorphism to handle the logic for me, the code is much cleaner. Further, if you need special handling of particular HTML types, you can simply subclass the appropriate token class.
Interestingly, after I uploaded the latest version of this module to the CPAN, I grepped for if statements in the code and I saw lines like this:
lib/HTML/TokeParser/Simple/Token/Tag.pm: return [] if END_TAG eq $INSTANCE{$self}{type};Instantly I realized the flaw. I only have one "tag" type. If a programmer subclasses this, she's going to override the behavior of both start tags and end tags. She'll need to test which type it is to prevent bugs and this is the sort of maintenance issue I wanted to avoid. Just seeing this keyword in my code instantly pointed out a design flaw. Now I just need to find the time to fix this :)
Cheers,
Ovid
New address of my CGI Course.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: "if" Considered Harmful in OO programming
by jryan (Vicar) on Sep 20, 2004 at 06:01 UTC | |
by Ovid (Cardinal) on Sep 20, 2004 at 06:35 UTC | |
Re: "if" Considered Harmful in OO programming
by dragonchild (Archbishop) on Sep 20, 2004 at 03:53 UTC | |
Re: "if" Considered Harmful in OO programming
by tilly (Archbishop) on Sep 20, 2004 at 15:57 UTC | |
by chromatic (Archbishop) on Sep 20, 2004 at 18:10 UTC | |
Re: "if" Considered Harmful in OO programming
by talexb (Chancellor) on Sep 20, 2004 at 19:28 UTC | |
by ikegami (Patriarch) on Sep 21, 2004 at 23:31 UTC | |
Hmm...
by SpanishInquisition (Pilgrim) on Sep 21, 2004 at 16:58 UTC | |
by bwelch (Curate) on Sep 21, 2004 at 17:25 UTC | |
by pdcawley (Hermit) on Sep 26, 2004 at 21:46 UTC | |
by radiantmatrix (Parson) on Sep 23, 2004 at 15:40 UTC | |
by SpanishInquisition (Pilgrim) on Sep 23, 2004 at 17:42 UTC |