kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I've a conditional test that's repeated a number of times in my code. The test is as follows:

## Perform test and call fraction only if test is true if ($str =~ /\d+_\d+) { $str = fraction($str); } ### subroutine fraction sub fraction { my $str = shift; # convert $str to fraction using mimetex syntax return $str; }
Should I let the subroutine fraction do the conditional test, as in:
# Call fraction and perform the conditional test there $str = fraction($str); ### subroutine fraction sub fraction { my $str = shift; if ($str =~ /\d+_\d+) { # convert $str to fraction using mimetex syntax } return $str; }
How would you have done it? Which way is better?

Thanks in anticipation :)

Replies are listed 'Best First'.
Re: Coding refactoring?
by dragonchild (Archbishop) on Apr 26, 2004 at 15:35 UTC
    I think that if the only time you call fraction() is if you have performed the test, then putting the test in the fraction() sub is just fine.

    Now, personally, I would do the following:

    $str = convert_to_fraction_if_needed( $str ); sub convert_to_fraction_if_needed { my $str = shift; return $str unless $str =~ /\d+_\d+/; return convert_to_fraction( $str ); } sub convert_to_fraction { my $str = shift; # Do stuff here return $converted_str; }

    I like my functions to be atomic, but that's just me. Either way is maintainable.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Thanks, dragonchild!

      It never occurred to me to break it down this way. Nice input :)

Code Clarity & Cost Analysis
by TomDLux (Vicar) on Apr 26, 2004 at 18:40 UTC

    The options are:

    1. Test whether to invoke the routine
    2. $str = fraction($str) if ($str =~ /\d+_\d+/);
    3. Invoke the routine unconditionally
    4. $str = fraction($str); ... sub fraction { my $str = shift; if ($str =~ /\d+_\d+/) { # convert $str to fraction using mimetex syntax } return $str; }

    In my mind, it comes down to two factors, code clarity and cost.

    Code Clarity

    Because of the nature of Perl programs, human readability is important. We spend more time maintaining and updating existing programs than writing new programs. According to Robert L. Glass, Facts and Fallacies of Software Engineering, page 115, "Maintenance typically consumes 40 to 80% ( average 60% ) of software costs." Since one of the selling points of Perl is the ease of maintenance, I suspect Perl scripts/programs are more likely to be modified than C or Java programs would be. The easier it is to understand a program, the quicker you can carry out the required modifications, so I consider it worthwhile devoting some effort in making code readable.

    If a condition invokes more than one action, use a traditional if block; if a series of conditions invokes the appropriate transformation, use the if/elsif/else block:

    if ($str =~ /\d+_\d+/) { $str = fraction($str) } elsif ( condition 2 ) { action 2 } else { default }

    If the important thing is that the value may be modified, use the conditional suffix:

    $str = fraction($str) if ($str =~ /\d+_\d+/)

    This emphasizes the modification, with the condition as a secondary factor.

    To emphasise the transformation even more, hide the condition by moving it into the routine, as you suggested.

    I don't totally like using $str in both left hand and right hand sides of the assignment. While far short of what M.J. Dominus calls a red flag, the double appearance suggests that other coding mechanisms should be considered.

    One possibility is to pass a reference to the routine, so that it takes the form of a subroutine without any return value:

    convert_ratio_to_decimal( \$str );

    Or, better yet, specify a prototype for the routine so that you don't have to mark the variable parameter as being passed as a reference:

    sub convert_ratio_to_decimal (\$) { ..... }; convert_ratio_to_decimal( $str );

    Maybe $str should not be a scalar at all, maybe it should really be an object, or a component of an object. If you can visualize a concrete or abstract thing---an airplane, a bank account---which has a set of operations it can perform, an object is a natural representation:

    $obj->convert_ration_to_decimal();

    Cost

    If this is an operation you perform on user input, efficiency is not too significant. If you are processing multi-million record log files, thinking in terms of costs becomes more significant.

    Do not obfuscate code thinking that you are making it faster, until you have run benchmarks to determine where you are spending your time. Often, a better algorithm might pay off better than a better implementation. Still, it can be worth remembering relative costs when writing your code.

    Invoking a function involves overhead. The overhead is tolerable for significant routines, but becomes expensive for frequently called routines that perform simple tasks. That's why C++ and Java provide inline functions.

    If there are many records, and only a few require modification, test first and invoke the function only if necessary.

    One problem with testing outside the routine is that the code needs to know about implementation details, in the regex /\d_\d+/. When the regex is in the function, details about the transformation and when it is performed are better encapsulated. One solution would be to put the condition into a routine of its own. Obviously, this may be expensive for a multi-million record problem, but might be suitable for simpler cases. I have used conditional routines in a error-tracking module I wrote ... actually, it was a class:

    $error->logBadDateError( $str ) if (ErrorTracking::invalidDate( $str ) +)

    In that case, some of the conditions were more complex, but all benefited from replacing clumps of code with meaningful names.

    In your case, it's a simple regex, so you don't need a function call, you could simply store the regex in a variable, possibly in an external module:

    $varName = qr(regex goes here); if ( $str =~ /$UnderscoredNumber/ ) ... # poor if ( $str =~ /$SimpleRatio/ ) ... # good if ( $str =~ /$Ratio::simpleFraction/ ) ... # better

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      While I wish I could ++ this node more than once, I do have one nit.

      I don't totally like using $str in both left hand and right hand sides of the assignment. While far short of what M.J. Dominus calls a red flag, the double appearance suggests that other coding mechanisms should be considered.

      Perl does this in the core with uc and friends. I originally considered this to be a crufty legacy from the past (and it may be), but I have since found it to be a benefit.

      My reasoning is simple - when I'm parsing something, I often want to keep the original unblemished and transform the original into a copy. While I can easily do something as assign it to a copy, I'm not always at liberty to be working with the absolute original copy. Sometimes, I, the caller of fraction(), am working with a partially-transformed version of the copy. By explicitly saying $str = fraction( $str );, I am telling my maintainer(s) what is going on without them having to read the code for fraction(). Simply saying convert_ratio_to_number( $str ); may not be explicit enough.

      Now, this argument does not apply in the case of an object. Methods are assumed to have side-effects where functions are not.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Thanks for your sharing. There's so much to learn from your inputs :)
Re: Call sub first or later
by broquaint (Abbot) on Apr 26, 2004 at 15:44 UTC
    It makes sense to move the conditional inside the subroutine as it shouldn't be operating on $str unless it is in the valid format. A small addition that could be made is some additional dwimmery to the subroutine such as modification of the string inplace and working on multiple values e.g
    sub fraction { return @_ unless @_ == grep /\d+_\d+/, @_; my $args = defined wantarray ? [@_] : \@_; $_ = convert_num_here($_) for @$args; return wantarray ? @$args : $args[-1]; }
    Also, if you're working with fractions davorg's Number::Fraction might be of interest.
    HTH

    _________
    broquaint

      Thanks, broquaint!

      I had been wondering why there wasn't a module for fractions. Will take a look at davorg's Number::Fraction.

      Might..or might not.

      Before recommending modules that aren't widely used, I think it is good to glance at the source-code and look for obvious problems. In this case the module is clearly "interesting proof of concept with significant limitations." Most significantly, it doesn't handle negative numbers.

      (I'll send a message to davorg about that.)

        Version 1.03 of Number::Fraction fixes the problem with negative numbers.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Call sub first or later
by Old_Gray_Bear (Bishop) on Apr 26, 2004 at 15:52 UTC
    My preference would be to see the condition in the subroutine, for two reasons:
    1. It keeps the boundary condition of the subroutine in the same location as the code body. This leaves no chance of running the sub by accident without first checking the input. (This may not be a crisis in this example, but there are other cases where calling a routine with incorrect (but syntactically valid) arguments can cause some really ugly, nasty, hard to sort out, bugs. Ask me sometime about the "cooling tower simulator"...)
    2. If your control logic comes in two pieces, putting both pieces together in one place means that the maintenance programmer will be automatically reminded of the conditions and limitations. Consider it a form of internal documentation, using the code to show your logic.

    ----
    I Go Back to Sleep, Now.

    OGB