Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

extracting subroutine names (together with their packages) via PPI

by clueless newbie (Curate)
on Jul 06, 2023 at 14:53 UTC ( [id://11153291] : perlquestion . print w/replies, xml ) Need Help??

clueless newbie has asked for the wisdom of the Perl Monks concerning the following question:

Given the following chunk of code:

#!/usr/bin/env perl use 5.032001; use warnings; my @packages; package Fee; push @packages,__PACKAGE__; sub one { say __PACKAGE__.q{::one}; }; package Fi; push @packages,__PACKAGE__; sub two { say __PACKAGE__.q{::two}; }; { package Foo; push @packages,__PACKAGE__; sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; }; # Dump symbols for my $package (@packages) { no strict; my %stash = %{"${package}::"}; use strict; warn Data::Dumper->new([\$package,\%stash],[qw(*package *stash)])- +>Deepcopy(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; }; 1; __DATA__

While Devel::Examine::Subs suggests:

'one', 'two', 'three', 'four'

dumping the symbols suggests

\'Fee' ( 'one' => *Fee::one ) \'Fi' ( 'BEGIN' => *Fi::BEGIN, 'DATA' => *Fi::DATA, 'four' => *Fi::four, 'two' => *Fi::two ) \'Foo' ( 'three' => *Foo::three )

My attempts to extract the full subroutine names using PPI have been failures. I can get the sub names and I can get the packages but not the needed relationship --- ie Fee::one, Fi::two etc. I would greatly appreciate knowing how to use PPI to extract the subroutine names in conjunction with their containing package.

Thanks!

Replies are listed 'Best First'.
Re: extracting subroutine names (together with their packages) via PPI
by stevieb (Canon) on Jul 06, 2023 at 20:04 UTC

    I found some time to play around for you. Here's a way that works. It works by collecting up all PPI statements within the file, iterating over them until it hits a package, saves that name, then adds to a hash all the subs within that package:

    use warnings; use strict; use Data::Dumper; use PPI; my $file = 'file.pl'; my $PPI_doc = PPI::Document->new($file); my $PPI_statements = $PPI_doc->find('PPI::Statement'); my %layout; my $package; for my $statement (@$PPI_statements) { if (ref $statement eq 'PPI::Statement::Package') { $package = $statement->namespace; } if (ref $statement eq 'PPI::Statement::Sub') { push @{ $layout{$package} }, $statement->name; } } print Dumper \%layout;

    Given your test script:

    $VAR1 = { 'Foo' => [ 'three', 'four' ], 'Fee' => [ 'one' ], 'Fi' => [ 'two' ] };

    That's literally very quick and dirty, but it does exactly what you want from what I can tell.

    Update: If you want to keep order of the packages (the subs will automatically be ordered), just keep the package names in an array (or you can modify the hash to keep an order field and a subs field or some such):

    my $doc = PPI::Document->new($file); my $stmts = $doc->find('PPI::Statement'); my %layout; my @order; my $package; for (@$stmts) { if (ref $_ eq 'PPI::Statement::Package') { $package = $_->namespace; push @order, $package; } if (ref $_ eq 'PPI::Statement::Sub') { push @{ $layout{$package} }, $_->name; } } for (@order) { print "$_:\n"; for (@{ $layout{$_} }) { print "\t$_\n"; } }

    Output:

    Fee: one Fi: two Foo: three four

      It's precisely the structure of

      package Fi; # sub two { say __PACKAGE__.q{::two}; }; { package Foo; # sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; };

      that messes up the sequential token scan! Both "two" and "four" belong to package "Fi". What is needed is traversing the document as a tree. (perldoc says " package NAMESPACE VERSION BLOCK").

        "that messes up the sequential token scan!"

        Precisely. It's very, very hard to parse perl with perl. Some say impossible even. No matter what technique is deployed, because Perl is There Is More Than One Way To Do It, there will ALWAYS be edge cases that you won't catch.

        As I said... I've spent a great number of years writing introspection type software, and no matter what approach you take, you'll always find gotchas. If what your doing is something so that you can refactor software to make it much more logical and proper (than what you have in the example above), do a pass, refactor, do another pass, refactor more until you're done, and ensure you have before and after unit tests already in place before anything's changed.

        Nothing is perfect. Wait until you start coming across code where someone has subroutine declarations/definitions inside of other subroutines, or even more fun, automatically generated subroutines that are created dynamically... that's a real head scratcher.

        My be all, end all advice here, is one package/class per file. It's logical to most parsing software, and it's *especially* more logical to the human reader.

Re: extracting subroutine names (together with their packages) via PPI
by stevieb (Canon) on Jul 06, 2023 at 16:13 UTC

    PPI, and therefore Devel::Examine::Subs is file based. They don't have the concept of packages at all.

    To do this properly, you'd have to read a file until you find a package token, then slurp up all subs until you hit the next package.

    All of my software uses one package per file, so its trivial to transform the package name from the file and path. If you have multiple packages per file and want to know which subs are in each package, you'll have create your own routine as I suggested above.

Re: extracting subroutine names (together with their packages) via PPI
by clueless newbie (Curate) on Jul 06, 2023 at 19:45 UTC

    Steve, thanks for your prompt response!

    Playing with something along the lines of:

    use 5.032001; use warnings; use Data::Dumper; use Devel::Symdump; use Path::Tiny; use PPR::X; use lib qw(.); sub scan4subs { my ($file)=@_; my @packages; # use PPR:X to scan for package and get their names: my $text=(path($file)->slurp); while ($text =~ m{((?&PerlPackageDeclaration)) $PPR::X::GRAMMA +R}gx) { push @packages,$1 =~ m{package\ ([:\w]+) }x; }; #warn Data::Dumper->new([\@packages],[qw(*packages)])->Deepcop +y(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; # Use Devel::Symdump to dump the functions: require $file; my @subs; for my $package(@packages) { my $obj=Devel::Symdump->new($package); push @subs,$obj->functions; #warn Data::Dumper->new([\@subs],[qw(*functions)])->Deepco +py(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; }; return \@subs; }; print "@{scan4subs('Fee.pm')}";

    which for the above chunk of code returns this

    Fee::one Fi::two Fi::four Foo::three

      Good effort!

      That said, see my other response below, where I do the whole thing with PPI. I've been working on perl code parsing for two decades now, and by far, hands down the most reliable way is to use PPI.

      It's a lot less code as well ;)

      -stevieb

Re: extracting subroutine names (together with their packages) via PPI
by karlgoethebier (Abbot) on Jul 07, 2023 at 12:03 UTC

    The anonymous monk below is right:

    use Module::Info; my $mod = Module::Info->new_from_file('Some/Module.pm'); # my @packages = $mod->packages_inside; my @subs = $mod->subroutines;

    « subroutines …Returns a hash of all subroutines defined inside this module and some info about it. The key is the *full* name of the subroutine (ie. $subs{'Some::Module::foo'} rather than just $subs{'foo'}), value is a hash ref with information about the subroutine…»

    «The Crux of the Biscuit is the Apostrophe»

Re: extracting subroutine names (together with their packages) via PPI
by Anonymous Monk on Jul 07, 2023 at 09:09 UTC
    *cough* vaguely member .... Module::Info gives you information about Perl modules without actually loading the module. ppixref - frontend for PPI::Xref for indexing and querying Perl code