Thank you everyone for the great help. I ended up using CPD with very good result. Amazingly enough it even ran straight from the link to the java web start. I was worried that any automated tool might have problems as the php also contains html and vml(ugh). But the output shows clearly that about 20 or so php files (significantly) have in common in the order of 100-150 lines of code in various (specified) places. So after doing this dedupe, should cut another several thousand lines of code. Trying to get to a code base where it actually becomes maintainable by some mere mortal like myself or someone else. the code was all written by a single author.
the hardest line to type correctly is: stty erase ^H

In reply to Re: general advice finding duplicate code by aquarium
in thread general advice finding duplicate code by aquarium

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.