garrison has asked for the wisdom of the Perl Monks concerning the following question:

I have inherited copious amounts of badly written Perl. Great strides toward readability have been made thanks to vi and perltidy, but the code is still far from readable. Are there more tools like perltidy? Specifically, I want to convert this:
$display .= "<tr><td><p><font size=-1>Markup</p></td><td><p><input typ +e=text name=markup size=25 maxlength=100 value=\"$markup\"></p></td>< +/tr>\n"; $display .= "<tr><td><p><font size=-1>Price</p></td><td><p><input type +=text name=price size=25 maxlength=100 value=\"$price\"></p></td></tr +>\n"; ...and so on for dozens of lines
into a nice HereDoc (and eventually into templates). My plan is to write a script to be called from within vi to reformat specific blocks of code, but I would like to avoid re-inventing the wheel if I can benefit from the wisdom of fellow monks. Barring a nice tool for this, what's the best plan of attack? Tips on rewriting long and nested if/else chains are also welcome.

Replies are listed 'Best First'.
Re: Cleanup tools (auto HereDoc?)
by Firefly258 (Beadle) on Dec 02, 2006 at 05:14 UTC
    Sounds like a job to be centered around HTML::PrettyPrinter amongst many other modules on CPAN that cater to "tidy markup". I'm not familiar with vi scripting but considering that it's quite agile, you shouldn't have a problem interfacing a vi script with a custom perl parser cum beautifier using HTML::PrettyPrinter.

    If it's just a few files that need modifying, I personally would just apply some elbow-grease - print out the HTML in $display to a seperate file, tidy it up with PrettyPrinter and copy it back into a heredoc, etc. I know if goes against a perl programmers virtue of laziness but I'd also consider not having a degenerate flexor. :)


    perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er
      This project is a mod_perl website written as one monolithic Site.pm weighing in at around twenty thousand lines. I'm in the process of introducing clean code, modules, and templates but it is simply not cost effective to do all this with "elbow grease" I want to convert rediculously numerous instances of $foo .= "bar\n"; into HereDocs as a step toward moving all the static HTML into templates.
Re: Cleanup tools (auto HereDoc?)
by graff (Chancellor) on Dec 03, 2006 at 02:54 UTC
    The sample is pungently symptomatic of "copy/paste" programming. I think converting all those lines (which are essentially dozens of copies of the same html markup pattern) into a HereDoc might simply be a wasted step, because it all should have been a loop over the elements of a hash in the first place.

    The approach I would take (and have taken when when I've had to fix other people's code) is to look over the offending block long enough to understand what sort of loop and/or subroutine would replicate it, write that loop and/or subroutine, and figure out how to populate the hash or array as I delete all the unnecessary copies of the duplicated code.

    In the long run, this approach might not be noticeably faster or easier than what you may have been planning on or looking for, but it won't be much slower either, I think, and you'll be on a more direct path to a more satisfying solution -- in fact, given that templates support loop structures for tables like this, you'll have a better head start on migrating to a proper template.

    It was probably kind of you not to show an example of the "long and nested if/else chains", but I would expect that these may also be the result of the same copy/paste style, so my first instinct, again, would be to work out the loop that should replace all the duplicated code blocks.

    Update: I just saw your reply about the total size of the problem -- and I can appreciate your point about needing some automation. Sadly, there's no magic bullet -- an alleged programmer (or team of stooges) dense enough to waste all the time it must have taken to generate that many lines of redundant code, will likely have done other things that are equally stupid (in imaginatively different ways), which you'll only discover as you slog through it block by block.

    But some "heuristic short-cuts" could be helpful... use perl to grep through the code for the blocks that will need similar forms of treatment:

    # print line numbers and code for blocks of "$x.=blah": perl -ne 'if(/^\s*(\$\w+)\s*\.=/){ if($1 ne $prv) {print "\n"; $prv=$1} printf "%5d: %s",$.,$_;}' monter.pl > bad_blocks.list
    You can apply other filters to the output of that, if you like, conditioning things as needed to make the transitions and refactoring easier, though the filtering may need to differ from one block to the next.

    Another technique that could help for locating the worst cases of copy/paste excess:

    perl -lpe 's/\#.*//; s/^\s+//; s/\s+$//; s/\s+/ /g; $_.="\t$." if $_' +monster.pl | perl -lne '($c,$n)=split /\t/; push @{$h{$c}},$n; END { print join("\t",$_, scalar @{$h{$c}}, join(",",@{$h{$_}})) for (sort keys %h) } > line.histogram
    (updated the beginning one-liner here, so empty/comment lines aren't counted.)

    That will give you an idea of how many times each line of code has been copied throughout the script (and what the line numbers are). And with the lines sorted ascii-betically, it may be easier to see what needs to be done to rectify things, because similar lines will be grouped together. For example, you could try something like this on the line.histogram output:

    perl -lpe '($c,$lines)=split /\t/; if($lc and $c){ $d=$c^$lc; $sc=$d=~y/\x0/#/;} $lc=$c; s/^/$sc\t/' line.histgram > line.similarity
    That uses the "^" (binary XOR) operator on two consecutive lines of code, and reports the number of identical characters (which come out as null bytes from the XOR). The comparison isn't entirely reliable as a scoring device, but for the kind of code block you've shown, it will prepend a big-enough number to the consecutive lines so that they will stand out (e.g. look for lines where the initial integer value is, say, greater than 10, meaning that at least the first 11 or more characters are identical to the previous line).
      This was my first approach, unfortunately this code is so convoluted that it is nearly impossible to follow; the offending block is frequently several hundred lines and interspersed with unrelated code. After several attempts I have decided to spend time just improving readability so I can more easily make improvements without breaking something. As an aside, I would like to add that I am astonished at how wrongly perl can be written and still work. My favorite example is
      $arglist->['cookietrail'] = "<a href=\"/\">Home</a>";
      This sort of thing is used extensively and "works" in the sense that Perl treats $foo['bar'] as $foo[0];
        It seems there may be no easy way to solve your problem from the looks of it. It seems quite a bit of refactoring is needed to incerase coherency and readability and also to root out and correct the mistakes. I don't think there are any modules, scripts or other code out there concocted to fix your problem, I don't think it'd be a wise idea to hold one's breath either. I hate sounding sarcastic but can i suggest a complete rewrite for a few reasons?

        • Compartmentatilization allows you to spread code out over more modules increasing maintainability and allowing you to prioritize loading and execution of code when and only when needed, thereby increasing the applications' effeciency and reducing the load on the hardware, a first step towards scalability.
        • A novel design could allow to to examine the drawbacks of the current approach, examine other better approaches, look into aspects that could be optmized and build a platform that will allow in the future, you or subsequent programmers to easily build other facilities into the current model and also allow you to easily attempt another code revision if the need arises (which usually does).
        • Refactoring allows you to gain first hand knowledge of the programmatic approad the previous programmer took, it allows you to take the best of both yours and his approaches to make something better. Hopefully, the code will be more modularized, more readable, easier to maintain and optmized at the end of the process.
        • It gives you the opportunity to see if any current technologies can be incorporated into the application to increase effeciency, scalability, etc. Things you have mentioned such as templating, Mason, mod_perl, memcached, etc, etc.
        • It gives you the opportunity to embark on a new project, to learn new code, techniques, etc. Something that your resume will be proud to bear.

        It might sound unfair for someone like me totally unaware of the task ahead to easily say "go back to the drawing board"... Well, sometimes, a rewrite is necessary and this looks like one of those instances, horrid looking code, incorrect syntax, monolithic design -- all the essence of a programmer's nightmare. Think of it, how long can you continue struggling to make do with the current implementation? Rather than do what you originally set out to do i.e. clean-up/beautify the present code, you seek to gain a lot more advantages starting over again, this time learning from other's mistakes and taking enough preemption to write cleaner, more effecient, scalable and elegant code - with the technologies you feel will benefit it, etc. Not only will you sleep better at night, you'll do your company good.


        perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er
        This reply of yours came before I finished my update, which I hope will be helpful. As for this latest example -- WOW! That certainly does validate the assumption I made in my update: anybody stupid enough to generate thousands of lines of duplicative, copy/paste code is bound to do all kinds of dumb-a$$ s#!t...
Re: Cleanup tools (auto HereDoc?)
by f00li5h (Chaplain) on Dec 03, 2006 at 01:41 UTC

    I'm not sure this is a perfect solution, but you could just eval that bit of code, and print the results. This would interpolate the variables, which would give you empty spots (ie where you have value="$markup"). So perhaps you would want to escape the sigils before you eval the code, then all you need do is throw in a $display = <<EOHTML arround it and you're done.

    Although while you're there, why not replace all the interpolations with [% $template_engine_name %] tags right off the bat, and then you won't have to come back and re-visit it when you come to roll out your template version.

    The more of the code you delete now, the less you have to come back and delete later, right?

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;
      Ah I was hoping someone would suggest eval(). I've been working on:
      my $line = 0; while (<>) { s/^([\smy]*\$\w+\s+\.?=\s+)['"](.*?)\\?n?['"];/$1<<END;\n$2/ unless $line > 0; s/^\s*\$\w+\s+\.=\s+['"](.*?)\\?n?['"];$/$1/g; s/\\"/"/g; $line++; print $_; } print "END\n";
      which works but is (I believe) a poor design. I'll play with your suggestion next. See my response to graff for why I can't jump right to template tags.
Re: Cleanup tools (auto HereDoc?)
by geekphilosopher (Friar) on Dec 02, 2006 at 21:59 UTC
    If you end up looking for more of a refactor than a tidy, it might be useful getting a copy of Perl Medic, a book written for precisely these situations.