comment on

Inspired by this SoPW node, I thought it would be fairly simple to rig up a "good enough" method for summarizing a large, complicated application that has been implemented as a set of inter-related source code files in Perl.

The basic idea is to provide a simplified, concise listing that covers a set of Perl files, reporting: (a) what subroutines are defined in each source file, and (b) what subroutine calls are issued from within each defined subroutine (or from the "main" portion of a given file). Usually, this can make it a little easier to figure out the relations, and maybe even the general control-flow, among a set of related source files.

This is just a quick try-out of the idea, using B::Deparse to make subroutine detection a bit less speculative and a bit more consistent. The report formatting is as simple as possible (I do want to get some sleep tonight...), but others may mold it to their particular needs without much fuss, I hope.

There are some fairly basic methods of assigning "names" to subroutines that will escape notice in this very simple technique (e.g. my $ref1 = \&someSub; my $ref2 = $ref1; ...) but this simple listing should help a fair bit in most cases.

I presume it will flail miserably when confronted with obfus (doesn't everybody?), and will fall short in some respects when applied to OO-style code (but such code shouldn't need this sort of diagnosis anyway -- or that's the theory, at least ;^). It will also need some adjustments when applied to Perl6, I gather...

(update: fixed the comment about "NONSUB") (another update: (2005-11-10) allowed for sub definitions to be indented: /^\s*sub/ instead of /^sub/; also used a slurp read of the "Deparse" error log, to get rid of a backtick "cat" command)

#!/usr/bin/perl

# Program:      scan-source.perl
# Written by:   dave graff
# Purpose:      look for and summarize the declarations and calls
#                of subroutines in a set of perl source files

use strict;
my $Usage = "$0 *.perl\n  will produce a summary of subs in perl files
+\n";

die $Usage unless ( @ARGV && -r $ARGV[0] );

my $erlog = "/tmp/scn-src.$$.errlog";
my %subdefs;  # hash to keep track of subroutine declarations
my %subrefs;  # hash to keep track of anonymous subroutine assignments
my %subsyns;  # hash to keep track of variables used as refs to named 
+subs
my %subused;  # hash to keep track of subroutine calls

foreach my $file ( @ARGV )
{
    warn "Deparsing $file...\n";
    my @lines = `perl -MO=Deparse,-p $file 2> $erlog`;
    unless ( @lines ) {
        warn "Nothing deparsed from $file -- let's move on.\n";
        next;
    }
    my $log = do { open my $erfh, $erlog; local $/; <$erfh> };
    unless ( $log =~ /^$file syntax OK/ ) {
        warn "Syntax errors in $file -- let's move on.\n";
        next;
    }

# Store an array of subroutine defs in a hash element
# keyed by file name:

    (@{$subdefs{$file}}) = map{ m/^\s*sub (\w+) \{$/ } @lines;

# Store an array of anonymous subroutines, in case these are
# being assigned to variables as references:

    @{$subrefs{$file}} = map{ m/[\$\@\%](\w+)\S* \= sub \{/ } @lines;

# Store an array of variable names, where these are being
# used to hold references to named subroutines:

    @{$subsyns{$file}} = map{ m/[\%\$\@](\w+)\S* \= .*\\\&\w+/ } @line
+s;

# Now march through the lines looking for subroutine calls,
# keeping track of which subroutine we're in (if any);
# store arrays of sub calls in a hash keyed by "$file:$insub"
# (or "$file:NONSUB")

# NOTE: we won't even try keeping track with respect to
# anonymous subs.

    my $insub = "NONSUB"; # initial state is "not in a sub"
    foreach ( @lines ) {
        if ( /^\s*sub (\w+) \{/ ) {
            $insub = $1;
        }
        elsif ( /^\}$/ ) {
            $insub = "NONSUB";
        }
        # regex to spot a func or sub call (or most of them) --
        # this will need to change for Perl6:
        elsif ( /(\&\{?)?\$?(\w+)\S*?\(/ ) {
            $subused{"$file:$insub"}{$2}++
                unless ( "my local ref return" =~ /$2/ );
        }
    }
}
unlink $erlog;  # Don't keep this (and don't care if unlink fails)

# Now do some reporting

foreach my $file ( @ARGV )
{
    next unless exists $subdefs{$file};

    print "\n\nFile: $file";
    print( "\n subs defined:\n  ".
           join( "\n  ",@{$subdefs{$file}},"" ))
        if ( scalar( @{$subdefs{$file}} ));
    print( "\n anon.subs assigned to variables:\n  ".
           join( "\n  ",@{$subrefs{$file}},"" ))
        if ( scalar( @{$subrefs{$file}} ));
    print( "\n variables holding named sub refs:\n  ".
           join( "\n  ",@{$subsyns{$file}},"" ))
        if ( scalar( @{$subsyns{$file}} ));
    foreach my $insub ( grep( /$file:/, sort keys %subused )) {
        print( "\n\n subs/functions called from $insub:  ".
               join( "\n  ", sort keys %{$subused{$insub}},"" ))
            if ( scalar( keys %{$subused{$insub}} ));
    }
}
print "\n\n";
[download]

In reply to Tabulate sub defs, sub calls in Perl code by graff

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Problems? Is your data what you think it is?
	PerlMonks