Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: extracting subroutine names (together with their packages) via PPI

by clueless newbie (Curate)
on Jul 06, 2023 at 21:52 UTC ( [id://11153299] : note . print w/replies, xml ) Need Help??


in reply to Re: extracting subroutine names (together with their packages) via PPI
in thread extracting subroutine names (together with their packages) via PPI

It's precisely the structure of

package Fi; # sub two { say __PACKAGE__.q{::two}; }; { package Foo; # sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; };

that messes up the sequential token scan! Both "two" and "four" belong to package "Fi". What is needed is traversing the document as a tree. (perldoc says " package NAMESPACE VERSION BLOCK").

Replies are listed 'Best First'.
Re^3: extracting subroutine names (together with their packages) via PPI
by stevieb (Canon) on Jul 06, 2023 at 23:47 UTC
    "that messes up the sequential token scan!"

    Precisely. It's very, very hard to parse perl with perl. Some say impossible even. No matter what technique is deployed, because Perl is There Is More Than One Way To Do It, there will ALWAYS be edge cases that you won't catch.

    As I said... I've spent a great number of years writing introspection type software, and no matter what approach you take, you'll always find gotchas. If what your doing is something so that you can refactor software to make it much more logical and proper (than what you have in the example above), do a pass, refactor, do another pass, refactor more until you're done, and ensure you have before and after unit tests already in place before anything's changed.

    Nothing is perfect. Wait until you start coming across code where someone has subroutine declarations/definitions inside of other subroutines, or even more fun, automatically generated subroutines that are created dynamically... that's a real head scratcher.

    My be all, end all advice here, is one package/class per file. It's logical to most parsing software, and it's *especially* more logical to the human reader.

      Hi, Steve! Should have thanked you for berrybrew long time ago!

      I've been getting my head muddled with PPI::X and tried the following on the same bit of code.

      #!/usr/bin/env perl use 5.032001; use warnings; use PPR::X; my $code=<<'_code_'; package Fee; sub one { say __PACKAGE__.q{::one}; }; package Fi; sub two { say __PACKAGE__.q{::two}; }; { package Foo; sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; }; _code_ my $re = qr{ # What to match... (?: (?<block> (?&PerlBlock) ) | (?<package> (?&PerlPackageDeclaration) ) | (?<subroutine> (?&PerlSubroutineDeclaration) ) ) $PPR::X::GRAMMAR }x; use Data::Dumper; while ($code =~ m{$re}gx) { warn Data::Dumper->new([\%+],[qw(*+)])->Deepcopy(1)->Indent(1)->Ma +xdepth(3)->Sortkeys(1)->Dump(),q{ }; }; __DATA__

      This turns up something rather interesting!

      %+ = ( 'package' => 'package Fee;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub one { say __PACKAGE__.q{::one}; }' ); at Fi_01.t line 44. %+ = ( 'package' => 'package Fi;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub two { say __PACKAGE__.q{::two}; }' ); at Fi_01.t line 44. %+ = ( # A BLOCK complete with its package/sub +routine declarations 'block' => '{ package Foo; sub three { say __PACKAGE__.q{::three}; }; }' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub four { say __PACKAGE__.q{:four}; }' ); at Fi_01.t line 44.

      Notice that we got a block without breaking out its package/subroutine declarations. The contents of that block might be recursively parsed - extracting its package and subroutine declarations.

        Tried recursively processing blocks via regex --- but that was/is beyond me! Felt so good when I quit banging my head against the wall!

        Converted it to use PPI and got this:

        use 5.032001; use warnings; #use Data::Dumper; use PPI; use PPR::X; #use PPI::Dumper; use Scalar::Util qw(blessed); my $PTR; my @PACKAGE=('main','main'); # Just a useful routine during delousing sub as_header { my $string=(' > ' x $PTR).ref shift; return $string.('='x (64-length $string)); }; sub process_chunk { $PACKAGE[++$PTR]=undef; for my $node (shift->children) { next # This should never happen! unless (blessed $node); next # WE don't care about these do we?? if ($node->isa('PPI::Statement::Null') || $node->isa('PPI: +:Token::Comment') || $node->isa('PPI::Token::Whitespace') || $node->isa('PPI +::Token::Structure')); #say as_header($node); if ($node->isa('PPI::Structure::Block')) { #say $node->content; } elsif ($node->isa('PPI::Statement::Package')) { #say $node->content; #say $node->child(2)->content; $PACKAGE[$PTR]=$node->child(2)->content; } elsif ($node->isa('PPI::Statement::Sub')) { #say $node->content; my $name=$node->child(2)->content; unless ($name =~ m{::}) { for (my $i=$PTR; $i >= 0; $i--) { if (defined $PACKAGE[$i]) { $name=qq{$PACKAGE[$i]::$name};#' last; }; }; }; say $name; } else { #PPI::Dumper->new($node,whitespace=>0)->print; }; # See if further processing is possible process_chunk($node) if (blessed $node && $node->can('children')); } $PTR--; }; # sub process_chunk: my $chunk=do { local $/; <DATA> }; # Let's let Conway confirm all is good to go! unless ($chunk =~ m{ (?&PerlEntireDocument) $PPR::X::GRAMMAR }x) { die "Invalid Perl code: " . $PPR::X::ERROR->source . "\n" } else { # Now go process the bloody thing! process_chunk(PPI::Document->new(\$chunk)); }; __DATA__ sub zero { # main::zero say __PACKAGE__.q{::zero}; }; package Fee; sub one { # Fee::one say __PACKAGE__.q{::one}; }; package Fi; sub two { # Fi::two say __PACKAGE__.q{::two}; }; { package Fo; sub three { # Foo::Three say __PACKAGE__.q{::three}; }; }; # End of package Foo! {; say "This block introduces no new package" }; sub Fum::four { # Fum::four say __PACKAGE__.q{:four}; }; sub four { # Fi::four say __PACKAGE__.q{:four}; }; package I_smell_the_blood_of_an_Englishman { # Package with a bloc +k sub five { # I_smell_the_blood_of_an_Englishman::five say qq{Fee-fi-fo-fum,\nI smell the blood of an Englishman, +\nBe he alive, or be he dead\nI'll grind his bones to make my bread.} +; }; };

        which produces a lovely

        main::zero Fee::one Fi::two Fo::three Fum::four Fi::four I_smell_the_blood_of_an_Englishman::five

        I'm sure that using Module::Info as per AM and karl would have been much quicker but while the whipping would have been less so would the joy!

        In an effort to stop the "flogging" I have (making use of PPI)

        #!/usr/bin/env perl # To paraphrase Voltaire who wrote # "Mais dans ce pays-ci il est bon de tuer de tems en tems un Amiral +pour encourager les autres." # we have in its place # "But in this company it is good to fire a programmer from time to t +ime to encourage the others." ## no critic (ProhibitPostfixControls,ProhibitUnlessBlocks) ## no critic (RequireExtendedFormatting,RequireDotMatchAnything,Requir +eLineBoundaryMatching) ## no critic (RequirePodSections,RequireCheckedSyscalls); use 5.032001; use warnings; use Data::Dumper; use English qw(-no_match_vars); {; # The useful stuff in in this block!! use Params::Validate qw(:all); use Path::Tiny; use PPI; use PPR::X; use Scalar::Util qw(blessed); my $contents; my $verbose; my ($PTR,@PACKAGE)=(0,'main','main'); my $subs; sub _subs { $PACKAGE[++$PTR]=undef; for my $node (shift->children) { next # This should never happen! unless (blessed $node); next # WE don't care about these do we?? if ($node->isa('PPI::Statement::Null') || $nod +e->isa('PPI::Token::Comment') || $node->isa('PPI::Token::Whitespace') || $no +de->isa('PPI::Token::Structure')); printf "%-40s: %s\n",(' > ' x $PTR).$node->class,( +' > ' x $PTR).substr($node->content =~ s{ (?&PerlNWS) $PPR::X::GRAMMA +R }{ }gxr,0,32) if ($verbose); if ($node->isa('PPI::Statement::Package')) { #say $node->content; $PACKAGE[$PTR]=$node->child(2)->content; } elsif ($node->isa('PPI::Statement::Sub')) { #say $node->content; my $name=$node->child(2)->content; if ($name =~ m{^(.*?)::(\w+)$}) { # name has p +ackage #;warn Data::Dumper->new([\$1,\$2],[qw(*1 +*2)])->Deepcopy(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; # The following fails on perl-5.32.1_64 #$subs->{''.$1}{''.$}}=$node->content =~ s +{ (?&PerlNWS) $PPR::X::GRAMMAR }{ }gxr; # ... so #;warn Data::Dumper->new([\$1,\$2],[qw(*1 +*2)])->Deepcopy(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; my ($package,$sub)=($1,$2); $subs->{$package}{$sub}=( $contents ? $node->content =~ s{ (?&Pe +rlNWS) $PPR::X::GRAMMAR }{ }gxr : (@{$node->location})[0] ); } else { for (my $i=$PTR; $i >= 0; $i--) { # Look f +or the containing package if (defined $PACKAGE[$i]) { $subs->{$PACKAGE[$i]}{$name}=( $contents ? $node->content =~ +s{ (?&PerlNWS) $PPR::X::GRAMMAR }{ }gxr : (@{$node->location})[0] ); last; }; }; }; }; # See if further processing is needed --- ie if th +is node leads to a sub declaration _subs($node) if ($node->can('find') && $node->find('PPI::St +atement::Sub')); } $PTR--; return; }; # sub _subs: sub subs { Params::Validate::validate_pos(@_,{ type=>SCALAR|SCALARREF },( +{}) x $#_); my $arg=shift; my (%opt_H)=Params::Validate::validate(@_,{ contents=>{ type=> +BOOLEAN,default=>0 },verbose=>{ type=>BOOLEAN,default=>0 } }); $contents=$opt_H{contents}; $verbose=$opt_H{verbose}; if (ref $arg eq 'SCALAR') { # Reference to a string unless (${$arg} =~ m{ (?&PerlEntireDocument) $PPR::X::GRAM +MAR }x) { die q{Invalid Perl code: }.$PPR::X::ERROR->source."\n" }; } else { # A Scalar -- hence a file's name unless (path($arg)->slurp =~ m{ (?&PerlEntireDocument) $PP +R::X::GRAMMAR }x) { die q{Invalid Perl code: }.$PPR::X::ERROR->source."\n" }; }; _subs(PPI::Document->new($arg)); return $subs; }; # sub subs: }; my $chunk=do { local $INPUT_RECORD_SEPARATOR=undef; <DATA> }; my $sub_href=subs(\$chunk,contents=>1,verbose=>1); warn Data::Dumper->new([\$sub_href],[qw(*sub_href)])->Deepcopy(1)->Ind +ent(1)->Maxdepth(3)->Sortkeys(1)->Dump().q{ }; exit; __DATA__ sub zero { # main::zero say __PACKAGE__.q{::zero}; }; package Fee; sub one { # Fee::one say __PACKAGE__.q{::one}; }; package Fi; sub two { # Fi::two say __PACKAGE__.q{::two}; }; { package Fo; sub three { # Foo::Three say __PACKAGE__.q{::three}; }; }; # End of package Foo! {; say "This block introduces no new package" }; sub Fum::four { # Fum::four say __PACKAGE__.q{:four}; }; sub four { # Fi::four say __PACKAGE__.q{:four}; }; package I_smell_the_blood_of_an_Englishman { # Package with a bloc +k sub five { # I_smell_the_blood_of_an_Englishman::five say qq{Fee-fi-fo-fum,\nI smell the blood of an Englishman, +\nBe he alive, or be he dead\nI'll grind his bones to make my bread.} +; }; };

        which turns up (when contents and verbose are enabled)

        > PPI::Statement::Sub : > sub zero { say __PACKAGE_ +_.q{::z > PPI::Statement::Package : > package Fee; > PPI::Statement::Sub : > sub one { say __PACKAGE__ +.q{::on > PPI::Statement::Package : > package Fi; > PPI::Statement::Sub : > sub two { say __PACKAGE__ +.q{::tw > PPI::Statement::Compound : > { package Fo; sub three { + say __ > > PPI::Structure::Block : > > { package Fo; sub thre +e { say __ > > > PPI::Statement::Package : > > > package Fo; > > > PPI::Statement::Sub : > > > sub three { say __P +ACKAGE__.q{:: > PPI::Statement : > {; say "This block introd +uces no > PPI::Statement::Sub : > sub Fum::four { say __PAC +KAGE__. > PPI::Statement::Sub : > sub four { say __PACKAGE_ +_.q{:fo > PPI::Statement::Package : > package I_smell_the_blood +_of_an_ > > PPI::Token::Word : > > package > > PPI::Token::Word : > > I_smell_the_blood_of_a +n_Englishm > > PPI::Structure::Block : > > { sub five { say qq{Fe +e-fi-fo-fu > > > PPI::Statement::Sub : > > > sub five { say qq{F +ee-fi-fo-fum, $sub_href = \{ 'Fee' => { 'one' => 'sub one { say __PACKAGE__.q{::one}; }' }, 'Fi' => { 'four' => 'sub four { say __PACKAGE__.q{:four}; }', 'two' => 'sub two { say __PACKAGE__.q{::two}; }' }, 'Fo' => { 'three' => 'sub three { say __PACKAGE__.q{::three}; }' }, 'Fum' => { 'four' => 'sub Fum::four { say __PACKAGE__.q{:four}; }' }, 'I_smell_the_blood_of_an_Englishman' => { 'five' => 'sub five { say qq{Fee-fi-fo-fum,\\nI smell the blood +of an Englishman,\\nBe he alive, or be he dead\\nI\'ll grind his bone +s to make my bread.}; }' }, 'main' => { 'zero' => 'sub zero { say __PACKAGE__.q{::zero}; }' } }; at PPI_03.t line 104, <DATA> line 1.