in reply to Refactoring challenge.

A small point: that s/// is pretty inefficient, and replacing it with:

substr($str, 0, $pos1) = '';
makes it clear that you could do it implicitly in the previous line just by adding an extra argument:
print $indent x $depth, substr( $str, 0, $pos1, '' );

dragonchild's refactored version would not need the $pos = $newpos + 2 correction if the (suitably renamed) $_print() did the same thing.

I'd be tempted to make it read a bit more cleanly by also passing in the depth to $_print() and include a slight reordering to flatten:

my $_consume = sub { print $indent x $_[2], substr $_[0], 0, $_[1], ''; } sub pp { return unless @_; my($str) = @_; my($length, $first) = f1($str); if (!defined $first) { $_consume->($str, $length, $depth); } elsif ($first eq CONST1) {
      $_consume->($str, $length, --$depth);
$_consume->($str, $length + 1, --$depth); } elsif (defined(my $newlen = f2($str, $first))) { $_consume->($str, $newlen + 2, $depth); } else { $_consume->($str, $length, $depth++); } return $str; }

Update: passing $str to be modified in this way is probably not safe - if I remember right, any attached magic will cause a temporary copy to be passed instead - so some slight modifications to pass \$str instead are probably required.

Update 2: added a missing + 1

Hugo

Replies are listed 'Best First'.
Re^2: Refactoring challenge.
by BrowserUk (Patriarch) on Mar 06, 2005 at 13:17 UTC

    Thanks Hugo, that helps a bit more :)

    A small point: that s/// is pretty inefficient,

    Unfortunately, $string =~ s[^.{$pos}\s*][]; doesn't just remove that part of the string that has been printed, it also removes any leading whitespace from the remainder of the string that otherwise would mess up the indenting.

    I've been trying to remove the need for that, by having f1() and f2() advance the pointer(s) beyond the whitespace after they locate their relevant positions, so that it gets output on the end of the previous line of output where it does no harm, but so far, I've failed.

    However, incorporating it into the _print/_consume sub is a good idea.

    And passing the $depth through simplifies things a bit more.

    Unfortunately, I cannot flatten the nesting in quite the way you have as the $pos ($length as you have it) will also be undefined if $first is undefined. I realise that this was implicit rather than explicit in the OP.

    It's a shame, and something I will attempt to correct if possible, because as you have it, the parser becomes a state machine which would be very nice.

    For now, the magic problem doesn't arise as $str is just a string (generated wholey within the outer layers of my code and will not have any magic attached), but it is something that I wasn't aware of and worth noting for the future.

    As it stands, the combined refactoring, in-situ of it's surrounding context, has introduced a minor edge case bug that wasn't there before, but the reduction in clutter should make it easier to resolve.

    The point I had been at for over a week, was that I had arrived at code that worked, by hacking at my initial attempt, but that it was so fragile that every time I tried to clean it up, it broke badly. By isolating this part from the rest and getting other eyes to look at it, it has simplified the overall thing to the point where it is much less fragile.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.

      Unfortunately, [the substitution] also removes any leading whitespace from the remainder of the string

      Oops, I completely missed that. But adding a trailing

      $_[0] =~ s/^\s+//;
      to $_consume() is enough to fix that.

      Unfortunately, I cannot flatten the nesting in quite the way you have as the $pos ($length as you have it) will also be undefined if $first is undefined.

      Both your original code and dragonchild's refactor contradict this - both use the returned $pos1 ($pos) unaltered when $first is undefined.

      For now, the magic problem doesn't arise as $str is just a string (generated wholey within the outer layers of my code and will not have any magic attached)

      As I remember it, the principle (and most embarrassing) time that it becomes a problem is when the parameter is tainted.

      Hugo

        both use the returned $pos1 ($pos) unaltered when $first is undefined.

        Ah! (blush} I oversimplified.

        In-situ, the call to f1() is followed by:

        return unless defined $pos;

        Information which makes your reduction very pertinent and holds out the possibility of reducing the whole snippet to a single level state machine.

        The block of code in the OP actually sits within a while loop in the real code. What the code does, is takes a string of the form:

        { a => [ SCALAR(0x18bb45c), { a => b, c => d, e => f, g => h, }, [ 1, + 2, 3,

        and pretty prints it like this:

        { a => [ SCALAR(0x18bf6a4), { a => b, c => d, e => f, g => h, }, [ 1, 2, 3, 4, 5, 6,

        or like this

        { a => [ SCALAR(0x18bf6a4), { a => b, c => d, e => f, g => h, }, [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ],

        or this

        { a => [ SCALAR(0x18bf6a4), { a => b, c => d, e => f, g => h, }, [ +1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ], {

        depending upon a width argument supplied--but there is an additional twist.

        The code generating the input does it in chunks as it recursively traverses an unknown datastructure. The pretty printer gets fed those chunks (it is called from the STORE routine of a tied scalar), and is charged with pretty printing the (accumulated) string each time it goes over the width limit without waiting for the whole string, or even a complete set of balanced text to accumulate.

        The idea is to avoid having to hold the whole of the potentially enormouse string in memory.

        I probably should have posted the whole thing for communal refactoring, but it has so many dependancies that it would be asking a lot for anyone to look at it.

        Not to mention, that I was embarassed by the code in every respect--except that it worked!


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re^2: Refactoring challenge.
by dragonchild (Archbishop) on Mar 07, 2005 at 13:26 UTC
    I like your rewrite of the logic. Mapping it to the original took a small braintwist, but that's cool. :-)

    I'm not sure the $_consume() method should encapsulate the printing as well. That makes it difficult to replace the print method without breaking stuff. To tell you the truth, I think the biggest issue is f1() and f2(). The entire pp() experience isn't being refactored correctly. For one thing, the fact that f1() and f2() are separated is a smell to me. The bigger smell of the flag (that BrowserUk noticed) is gone, but that leaves the smaller smell of the two functions that (seemingly) do similar things.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.