comment on

The sample is pungently symptomatic of "copy/paste" programming. I think converting all those lines (which are essentially dozens of copies of the same html markup pattern) into a HereDoc might simply be a wasted step, because it all should have been a loop over the elements of a hash in the first place.

The approach I would take (and have taken when when I've had to fix other people's code) is to look over the offending block long enough to understand what sort of loop and/or subroutine would replicate it, write that loop and/or subroutine, and figure out how to populate the hash or array as I delete all the unnecessary copies of the duplicated code.

In the long run, this approach might not be noticeably faster or easier than what you may have been planning on or looking for, but it won't be much slower either, I think, and you'll be on a more direct path to a more satisfying solution -- in fact, given that templates support loop structures for tables like this, you'll have a better head start on migrating to a proper template.

It was probably kind of you not to show an example of the "long and nested if/else chains", but I would expect that these may also be the result of the same copy/paste style, so my first instinct, again, would be to work out the loop that should replace all the duplicated code blocks.

Update: I just saw your reply about the total size of the problem -- and I can appreciate your point about needing some automation. Sadly, there's no magic bullet -- an alleged programmer (or team of stooges) dense enough to waste all the time it must have taken to generate that many lines of redundant code, will likely have done other things that are equally stupid (in imaginatively different ways), which you'll only discover as you slog through it block by block.

But some "heuristic short-cuts" could be helpful... use perl to grep through the code for the blocks that will need similar forms of treatment:

# print line numbers and code for blocks of "$x.=blah":

perl -ne 'if(/^\s*(\$\w+)\s*\.=/){
  if($1 ne $prv) {print "\n"; $prv=$1}
  printf "%5d: %s",$.,$_;}' monter.pl > bad_blocks.list
[download]

You can apply other filters to the output of that, if you like, conditioning things as needed to make the transitions and refactoring easier, though the filtering may need to differ from one block to the next.

Another technique that could help for locating the worst cases of copy/paste excess:

perl -lpe 's/\#.*//; s/^\s+//; s/\s+$//; s/\s+/ /g; $_.="\t$." if $_' 
+monster.pl |
  perl -lne '($c,$n)=split /\t/; push @{$h{$c}},$n;
    END { print join("\t",$_,
                     scalar @{$h{$c}},
                     join(",",@{$h{$_}}))
             for (sort keys %h)
        } > line.histogram
[download]

(updated the beginning one-liner here, so empty/comment lines aren't counted.)

That will give you an idea of how many times each line of code has been copied throughout the script (and what the line numbers are). And with the lines sorted ascii-betically, it may be easier to see what needs to be done to rectify things, because similar lines will be grouped together. For example, you could try something like this on the line.histogram output:

perl -lpe '($c,$lines)=split /\t/;
  if($lc and $c){ $d=$c^$lc; $sc=$d=~y/\x0/#/;}
  $lc=$c; s/^/$sc\t/' line.histgram > line.similarity
[download]

That uses the "^" (binary XOR) operator on two consecutive lines of code, and reports the number of identical characters (which come out as null bytes from the XOR). The comparison isn't entirely reliable as a scoring device, but for the kind of code block you've shown, it will prepend a big-enough number to the consecutive lines so that they will stand out (e.g. look for lines where the initial integer value is, say, greater than 10, meaning that at least the first 11 or more characters are identical to the previous line).

In reply to Re: Cleanup tools (auto HereDoc?) by graff
in thread Cleanup tools (auto HereDoc?) by garrison

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.