Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: general advice finding duplicate code

by aquarium (Curate)
on Jun 21, 2011 at 06:05 UTC ( [id://910688]=note: print w/replies, xml ) Need Help??


in reply to general advice finding duplicate code

Thanks for the responses so far. i'll look up the clone doctor code...however i cannot send this codebase to 3rd parties.
the second approach, using dumper, looks like will only identify duplicated but individual lines of code across the scripts...which would be just as easy to do using
cat *.php | sort | uniq -c
i'll keep thinking about it too..and will post any gems. a brute force reducing sliding window between two scripts is possible but probably blow out to hours/days of running time for the 40 or so script pair combinations.
the hardest line to type correctly is: stty erase ^H
  • Comment on Re: general advice finding duplicate code

Replies are listed 'Best First'.
Re^2: general advice finding duplicate code
by Anonymous Monk on Jun 21, 2011 at 06:49 UTC
    looks like will only identify duplicated but individual lines of code across the scripts

    Every approach is this approach :) its like a search engine

    You iterate over you files, and you index each file

    To index, you pick a unit (ex one word, or three adjacent lines of code)

    Generate a list of all units for a file

    Normalize each unit. For words you would stem (remove prefix/suffix..) to find the root, for lines you would remove insignificant whitespace, insignificant commas... normalize quoting characters...

    Hash each unit (sha1), and associate all this in a database

    Then, to find duplication, query the database to find duplicate hashes

    This is not unlike what git (git gc) does, so I wouldn't be surprised if git provides provided a tool to help you visualize these duplications, although I don't know of one

    It goes without saying before making code changes, you need a comprehensive test suite :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://910688]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-23 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found