in reply to extracting subroutine names (together with their packages) via PPI

I found some time to play around for you. Here's a way that works. It works by collecting up all PPI statements within the file, iterating over them until it hits a package, saves that name, then adds to a hash all the subs within that package:

use warnings; use strict; use Data::Dumper; use PPI; my $file = 'file.pl'; my $PPI_doc = PPI::Document->new($file); my $PPI_statements = $PPI_doc->find('PPI::Statement'); my %layout; my $package; for my $statement (@$PPI_statements) { if (ref $statement eq 'PPI::Statement::Package') { $package = $statement->namespace; } if (ref $statement eq 'PPI::Statement::Sub') { push @{ $layout{$package} }, $statement->name; } } print Dumper \%layout;

Given your test script:

$VAR1 = { 'Foo' => [ 'three', 'four' ], 'Fee' => [ 'one' ], 'Fi' => [ 'two' ] };

That's literally very quick and dirty, but it does exactly what you want from what I can tell.

Update: If you want to keep order of the packages (the subs will automatically be ordered), just keep the package names in an array (or you can modify the hash to keep an order field and a subs field or some such):

my $doc = PPI::Document->new($file); my $stmts = $doc->find('PPI::Statement'); my %layout; my @order; my $package; for (@$stmts) { if (ref $_ eq 'PPI::Statement::Package') { $package = $_->namespace; push @order, $package; } if (ref $_ eq 'PPI::Statement::Sub') { push @{ $layout{$package} }, $_->name; } } for (@order) { print "$_:\n"; for (@{ $layout{$_} }) { print "\t$_\n"; } }

Output:

Fee: one Fi: two Foo: three four

Replies are listed 'Best First'.
Re^2: extracting subroutine names (together with their packages) via PPI
by clueless newbie (Curate) on Jul 06, 2023 at 21:52 UTC

    It's precisely the structure of

    package Fi; # sub two { say __PACKAGE__.q{::two}; }; { package Foo; # sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; };

    that messes up the sequential token scan! Both "two" and "four" belong to package "Fi". What is needed is traversing the document as a tree. (perldoc says " package NAMESPACE VERSION BLOCK").

      "that messes up the sequential token scan!"

      Precisely. It's very, very hard to parse perl with perl. Some say impossible even. No matter what technique is deployed, because Perl is There Is More Than One Way To Do It, there will ALWAYS be edge cases that you won't catch.

      As I said... I've spent a great number of years writing introspection type software, and no matter what approach you take, you'll always find gotchas. If what your doing is something so that you can refactor software to make it much more logical and proper (than what you have in the example above), do a pass, refactor, do another pass, refactor more until you're done, and ensure you have before and after unit tests already in place before anything's changed.

      Nothing is perfect. Wait until you start coming across code where someone has subroutine declarations/definitions inside of other subroutines, or even more fun, automatically generated subroutines that are created dynamically... that's a real head scratcher.

      My be all, end all advice here, is one package/class per file. It's logical to most parsing software, and it's *especially* more logical to the human reader.

        Hi, Steve! Should have thanked you for berrybrew long time ago!

        I've been getting my head muddled with PPI::X and tried the following on the same bit of code.

        #!/usr/bin/env perl use 5.032001; use warnings; use PPR::X; my $code=<<'_code_'; package Fee; sub one { say __PACKAGE__.q{::one}; }; package Fi; sub two { say __PACKAGE__.q{::two}; }; { package Foo; sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; }; _code_ my $re = qr{ # What to match... (?: (?<block> (?&PerlBlock) ) | (?<package> (?&PerlPackageDeclaration) ) | (?<subroutine> (?&PerlSubroutineDeclaration) ) ) $PPR::X::GRAMMAR }x; use Data::Dumper; while ($code =~ m{$re}gx) { warn Data::Dumper->new([\%+],[qw(*+)])->Deepcopy(1)->Indent(1)->Ma +xdepth(3)->Sortkeys(1)->Dump(),q{ }; }; __DATA__

        This turns up something rather interesting!

        %+ = ( 'package' => 'package Fee;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub one { say __PACKAGE__.q{::one}; }' ); at Fi_01.t line 44. %+ = ( 'package' => 'package Fi;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub two { say __PACKAGE__.q{::two}; }' ); at Fi_01.t line 44. %+ = ( # A BLOCK complete with its package/sub +routine declarations 'block' => '{ package Foo; sub three { say __PACKAGE__.q{::three}; }; }' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub four { say __PACKAGE__.q{:four}; }' ); at Fi_01.t line 44.

        Notice that we got a block without breaking out its package/subroutine declarations. The contents of that block might be recursively parsed - extracting its package and subroutine declarations.