in reply to Cleanup tools (auto HereDoc?)
The approach I would take (and have taken when when I've had to fix other people's code) is to look over the offending block long enough to understand what sort of loop and/or subroutine would replicate it, write that loop and/or subroutine, and figure out how to populate the hash or array as I delete all the unnecessary copies of the duplicated code.
In the long run, this approach might not be noticeably faster or easier than what you may have been planning on or looking for, but it won't be much slower either, I think, and you'll be on a more direct path to a more satisfying solution -- in fact, given that templates support loop structures for tables like this, you'll have a better head start on migrating to a proper template.
It was probably kind of you not to show an example of the "long and nested if/else chains", but I would expect that these may also be the result of the same copy/paste style, so my first instinct, again, would be to work out the loop that should replace all the duplicated code blocks.
Update: I just saw your reply about the total size of the problem -- and I can appreciate your point about needing some automation. Sadly, there's no magic bullet -- an alleged programmer (or team of stooges) dense enough to waste all the time it must have taken to generate that many lines of redundant code, will likely have done other things that are equally stupid (in imaginatively different ways), which you'll only discover as you slog through it block by block.
But some "heuristic short-cuts" could be helpful... use perl to grep through the code for the blocks that will need similar forms of treatment:
You can apply other filters to the output of that, if you like, conditioning things as needed to make the transitions and refactoring easier, though the filtering may need to differ from one block to the next.# print line numbers and code for blocks of "$x.=blah": perl -ne 'if(/^\s*(\$\w+)\s*\.=/){ if($1 ne $prv) {print "\n"; $prv=$1} printf "%5d: %s",$.,$_;}' monter.pl > bad_blocks.list
Another technique that could help for locating the worst cases of copy/paste excess:
(updated the beginning one-liner here, so empty/comment lines aren't counted.)perl -lpe 's/\#.*//; s/^\s+//; s/\s+$//; s/\s+/ /g; $_.="\t$." if $_' +monster.pl | perl -lne '($c,$n)=split /\t/; push @{$h{$c}},$n; END { print join("\t",$_, scalar @{$h{$c}}, join(",",@{$h{$_}})) for (sort keys %h) } > line.histogram
That will give you an idea of how many times each line of code has been copied throughout the script (and what the line numbers are). And with the lines sorted ascii-betically, it may be easier to see what needs to be done to rectify things, because similar lines will be grouped together. For example, you could try something like this on the line.histogram output:
That uses the "^" (binary XOR) operator on two consecutive lines of code, and reports the number of identical characters (which come out as null bytes from the XOR). The comparison isn't entirely reliable as a scoring device, but for the kind of code block you've shown, it will prepend a big-enough number to the consecutive lines so that they will stand out (e.g. look for lines where the initial integer value is, say, greater than 10, meaning that at least the first 11 or more characters are identical to the previous line).perl -lpe '($c,$lines)=split /\t/; if($lc and $c){ $d=$c^$lc; $sc=$d=~y/\x0/#/;} $lc=$c; s/^/$sc\t/' line.histgram > line.similarity
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Cleanup tools (auto HereDoc?)
by garrison (Scribe) on Dec 03, 2006 at 03:23 UTC | |
by Firefly258 (Beadle) on Dec 03, 2006 at 04:42 UTC | |
by garrison (Scribe) on Dec 03, 2006 at 05:51 UTC | |
by f00li5h (Chaplain) on Dec 03, 2006 at 12:54 UTC | |
by Firefly258 (Beadle) on Dec 03, 2006 at 13:42 UTC | |
by graff (Chancellor) on Dec 03, 2006 at 04:29 UTC | |
by garrison (Scribe) on Dec 03, 2006 at 05:57 UTC |