comment on

My Devel::Examine::Subs can help with some of this. It uses PPI behind the scenes. It can gather all subs in a file, or a whole directory, then list all subs in all those files. It can even examine each sub and collect only ones that have lines containing specified search patterns, print out which lines each sub starts/ends, and also how many lines are in each sub.

Collect and display all subs in all files in the current working directory:

use warnings;
use strict;

use Devel::Examine::Subs;

my $des = Devel::Examine::Subs->new(file => '.');

my $data = $des->all;

for my $file (keys %$data){
    print "$file\n";
    for (@{ $data->{$file} }){
        print "\t$_\n";
    }
}
[download]

Sample output:

lib/Test/BrewBuild/Git.pm
    new
    git
    link
    name
    clone
    pull
lib/Test/BrewBuild/BrewCommands.pm
    new
    brew
    info
    installed
    using
    available
    install
    remove
    is_win
    _legacy_perls
[download]

Get all the subs in the same manner, but collect them as objects instead to get a lot more information on each one:

use warnings;
use strict;

use Devel::Examine::Subs;

my $des = Devel::Examine::Subs->new(file => '.');

my $data = $des->all;

for my $file (keys %$data){
    print "$file\n";

    my $subs = $des->objects(file => $file);

    for my $sub (@$subs){
        print "\t" . $sub->name ."\n";
        print "\t\t lines: " . $sub->line_count ."\n";
        print "\t\t start: " . $sub->start ."\n";
        print "\t\t end:   " . $sub->end . "\n";
    }
}
[download]

Sample output:

lib/Test/BrewBuild/Dispatch.pm
    _fork
         lines: 111
         start: 146
         end:   256
    new
         lines: 21
         start: 21
         end:   41
    dispatch
         lines: 87
         start: 42
         end:   128
    _config
         lines: 17
         start: 129
         end:   145
lib/Test/BrewBuild/Git.pm
    name
         lines: 6
         start: 34
         end:   39
    git
         lines: 17
         start: 12
         end:   28
[download]

The main reason I wrote this software is so that I could introspect subs accurately, and then if necessary insert code in specific subs at either a line number or search term (yes, this distribution does that as well). You can even search for specific lines in each sub, and print out the line numbers those search patterns appear on.

Of course, using the above techniques, it would be trivial to filter out which files have duplicated subs, stash all the duplicate names (along with file name) then using the objects, compare the length of the subs to do a cursory check to see if they appear to be an exact copy/paste (if the number of lines are the same). The synopsis in the docs explain how to get the objects within a hash, so that the hash's key is the sub's name. This may make things easier.

update: I forgot to mention that each subroutine object also contains the full code for the sub in $sub->code. This should help tremendously in programmatically comparing a sub from one file to the dup sub in another file.

In reply to Re: Searching for duplication in legacy code (updated) by stevieb
in thread Searching for duplication in legacy code by yulivee07

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.