Thanks for the responses so far. i'll look up the clone doctor code...however i cannot send this codebase to 3rd parties.
the second approach, using dumper, looks like will only identify duplicated but individual lines of code across the scripts...which would be just as easy to do using
cat *.php | sort | uniq -c
i'll keep thinking about it too..and will post any gems. a brute force reducing sliding window between two scripts is possible but probably blow out to hours/days of running time for the 40 or so script pair combinations.
the hardest line to type correctly is: stty erase ^H

In reply to Re: general advice finding duplicate code by aquarium
in thread general advice finding duplicate code by aquarium

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.