Re: Cleanup tools (auto HereDoc?)

The sample is pungently symptomatic of "copy/paste" programming. I think converting all those lines (which are essentially dozens of copies of the same html markup pattern) into a HereDoc might simply be a wasted step, because it all should have been a loop over the elements of a hash in the first place.

The approach I would take (and have taken when when I've had to fix other people's code) is to look over the offending block long enough to understand what sort of loop and/or subroutine would replicate it, write that loop and/or subroutine, and figure out how to populate the hash or array as I delete all the unnecessary copies of the duplicated code.

In the long run, this approach might not be noticeably faster or easier than what you may have been planning on or looking for, but it won't be much slower either, I think, and you'll be on a more direct path to a more satisfying solution -- in fact, given that templates support loop structures for tables like this, you'll have a better head start on migrating to a proper template.

It was probably kind of you not to show an example of the "long and nested if/else chains", but I would expect that these may also be the result of the same copy/paste style, so my first instinct, again, would be to work out the loop that should replace all the duplicated code blocks.

Update: I just saw your reply about the total size of the problem -- and I can appreciate your point about needing some automation. Sadly, there's no magic bullet -- an alleged programmer (or team of stooges) dense enough to waste all the time it must have taken to generate that many lines of redundant code, will likely have done other things that are equally stupid (in imaginatively different ways), which you'll only discover as you slog through it block by block.

But some "heuristic short-cuts" could be helpful... use perl to grep through the code for the blocks that will need similar forms of treatment:

# print line numbers and code for blocks of "$x.=blah":

perl -ne 'if(/^\s*(\$\w+)\s*\.=/){
  if($1 ne $prv) {print "\n"; $prv=$1}
  printf "%5d: %s",$.,$_;}' monter.pl > bad_blocks.list
[download]

You can apply other filters to the output of that, if you like, conditioning things as needed to make the transitions and refactoring easier, though the filtering may need to differ from one block to the next.

Another technique that could help for locating the worst cases of copy/paste excess:

perl -lpe 's/\#.*//; s/^\s+//; s/\s+$//; s/\s+/ /g; $_.="\t$." if $_' 
+monster.pl |
  perl -lne '($c,$n)=split /\t/; push @{$h{$c}},$n;
    END { print join("\t",$_,
                     scalar @{$h{$c}},
                     join(",",@{$h{$_}}))
             for (sort keys %h)
        } > line.histogram
[download]

(updated the beginning one-liner here, so empty/comment lines aren't counted.)

That will give you an idea of how many times each line of code has been copied throughout the script (and what the line numbers are). And with the lines sorted ascii-betically, it may be easier to see what needs to be done to rectify things, because similar lines will be grouped together. For example, you could try something like this on the line.histogram output:

perl -lpe '($c,$lines)=split /\t/;
  if($lc and $c){ $d=$c^$lc; $sc=$d=~y/\x0/#/;}
  $lc=$c; s/^/$sc\t/' line.histgram > line.similarity
[download]

That uses the "^" (binary XOR) operator on two consecutive lines of code, and reports the number of identical characters (which come out as null bytes from the XOR). The comparison isn't entirely reliable as a scoring device, but for the kind of code block you've shown, it will prepend a big-enough number to the consecutive lines so that they will stand out (e.g. look for lines where the initial integer value is, say, greater than 10, meaning that at least the first 11 or more characters are identical to the previous line).

Comment on Re: Cleanup tools (auto HereDoc?) Select or Download Code

Replies are listed 'Best First'.
Re^2: Cleanup tools (auto HereDoc?) by garrison (Scribe) on Dec 03, 2006 at 03:23 UTC
This was my first approach, unfortunately this code is so convoluted that it is nearly impossible to follow; the offending block is frequently several hundred lines and interspersed with unrelated code. After several attempts I have decided to spend time just improving readability so I can more easily make improvements without breaking something. As an aside, I would like to add that I am astonished at how wrongly perl can be written and still work. My favorite example is `$arglist->['cookietrail'] = "<a href=\"/\">Home</a>";` [download] This sort of thing is used extensively and "works" in the sense that Perl treats `$foo['bar']` as `$foo[0];`	[reply] [d/l] [select]
Re^3: Cleanup tools (auto HereDoc?) by Firefly258 (Beadle) on Dec 03, 2006 at 04:42 UTC
It seems there may be no easy way to solve your problem from the looks of it. It seems quite a bit of refactoring is needed to incerase coherency and readability and also to root out and correct the mistakes. I don't think there are any modules, scripts or other code out there concocted to fix your problem, I don't think it'd be a wise idea to hold one's breath either. I hate sounding sarcastic but can i suggest a complete rewrite for a few reasons? Compartmentatilization allows you to spread code out over more modules increasing maintainability and allowing you to prioritize loading and execution of code when and only when needed, thereby increasing the applications' effeciency and reducing the load on the hardware, a first step towards scalability. A novel design could allow to to examine the drawbacks of the current approach, examine other better approaches, look into aspects that could be optmized and build a platform that will allow in the future, you or subsequent programmers to easily build other facilities into the current model and also allow you to easily attempt another code revision if the need arises (which usually does). Refactoring allows you to gain first hand knowledge of the programmatic approad the previous programmer took, it allows you to take the best of both yours and his approaches to make something better. Hopefully, the code will be more modularized, more readable, easier to maintain and optmized at the end of the process. It gives you the opportunity to see if any current technologies can be incorporated into the application to increase effeciency, scalability, etc. Things you have mentioned such as templating, Mason, mod_perl, memcached, etc, etc. It gives you the opportunity to embark on a new project, to learn new code, techniques, etc. Something that your resume will be proud to bear. It might sound unfair for someone like me totally unaware of the task ahead to easily say "go back to the drawing board"... Well, sometimes, a rewrite is necessary and this looks like one of those instances, horrid looking code, incorrect syntax, monolithic design -- all the essence of a programmer's nightmare. Think of it, how long can you continue struggling to make do with the current implementation? Rather than do what you originally set out to do i.e. clean-up/beautify the present code, you seek to gain a lot more advantages starting over again, this time learning from other's mistakes and taking enough preemption to write cleaner, more effecient, scalable and elegant code - with the technologies you feel will benefit it, etc. Not only will you sleep better at night, you'll do your company good. `perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er`	[reply] [d/l]
Re^4: Cleanup tools (auto HereDoc?) by garrison (Scribe) on Dec 03, 2006 at 05:51 UTC
I absolutely agree; unfortunately the customer did not want to start over from scratch and I'm disinclined to walk away from a project simply because it requires too many tedious (but billable) hours. I look at this as an opportunity to develop some new clever techniques.	[reply]
Re^5: Cleanup tools (auto HereDoc?) by f00li5h (Chaplain) on Dec 03, 2006 at 12:54 UTC
Re^5: Cleanup tools (auto HereDoc?) by Firefly258 (Beadle) on Dec 03, 2006 at 13:42 UTC
Re^3: Cleanup tools (auto HereDoc?) by graff (Chancellor) on Dec 03, 2006 at 04:29 UTC
This reply of yours came before I finished my update, which I hope will be helpful. As for this latest example -- WOW! That certainly does validate the assumption I made in my update: anybody stupid enough to generate thousands of lines of duplicative, copy/paste code is bound to do all kinds of dumb-a$$ s#!t...	[reply]
Re^4: Cleanup tools (auto HereDoc?) by garrison (Scribe) on Dec 03, 2006 at 05:57 UTC
I'm too tired to digest your code suggestions now (00:57 EST) but I will. I'd like to thank all who took the time to reply, I am getting exactly what I had hoped for: new ideas to toss around. This is what I love about Perl. :)	[reply]