Perl-Sensitive Sunglasses | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
looks like will only identify duplicated but individual lines of code across the scripts
Every approach is this approach :) its like a search engine You iterate over you files, and you index each file To index, you pick a unit (ex one word, or three adjacent lines of code) Generate a list of all units for a file Normalize each unit. For words you would stem (remove prefix/suffix..) to find the root, for lines you would remove insignificant whitespace, insignificant commas... normalize quoting characters... Hash each unit (sha1), and associate all this in a database Then, to find duplication, query the database to find duplicate hashes This is not unlike what git (git gc) does, so I wouldn't be surprised if git provides provided a tool to help you visualize these duplications, although I don't know of one It goes without saying before making code changes, you need a comprehensive test suite :) In reply to Re^2: general advice finding duplicate code
by Anonymous Monk
|
|