Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: extracting subroutine names (together with their packages) via PPI

by clueless newbie (Curate)
on Jul 07, 2023 at 00:51 UTC ( [id://11153302] : note . print w/replies, xml ) Need Help??


in reply to Re^3: extracting subroutine names (together with their packages) via PPI
in thread extracting subroutine names (together with their packages) via PPI

Hi, Steve! Should have thanked you for berrybrew long time ago!

I've been getting my head muddled with PPI::X and tried the following on the same bit of code.

#!/usr/bin/env perl use 5.032001; use warnings; use PPR::X; my $code=<<'_code_'; package Fee; sub one { say __PACKAGE__.q{::one}; }; package Fi; sub two { say __PACKAGE__.q{::two}; }; { package Foo; sub three { say __PACKAGE__.q{::three}; }; }; # End of package Foo! sub four { say __PACKAGE__.q{:four}; }; _code_ my $re = qr{ # What to match... (?: (?<block> (?&PerlBlock) ) | (?<package> (?&PerlPackageDeclaration) ) | (?<subroutine> (?&PerlSubroutineDeclaration) ) ) $PPR::X::GRAMMAR }x; use Data::Dumper; while ($code =~ m{$re}gx) { warn Data::Dumper->new([\%+],[qw(*+)])->Deepcopy(1)->Indent(1)->Ma +xdepth(3)->Sortkeys(1)->Dump(),q{ }; }; __DATA__

This turns up something rather interesting!

%+ = ( 'package' => 'package Fee;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub one { say __PACKAGE__.q{::one}; }' ); at Fi_01.t line 44. %+ = ( 'package' => 'package Fi;' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub two { say __PACKAGE__.q{::two}; }' ); at Fi_01.t line 44. %+ = ( # A BLOCK complete with its package/sub +routine declarations 'block' => '{ package Foo; sub three { say __PACKAGE__.q{::three}; }; }' ); at Fi_01.t line 44. %+ = ( 'subroutine' => 'sub four { say __PACKAGE__.q{:four}; }' ); at Fi_01.t line 44.

Notice that we got a block without breaking out its package/subroutine declarations. The contents of that block might be recursively parsed - extracting its package and subroutine declarations.

Replies are listed 'Best First'.
Re^5: extracting subroutine names (together with their packages) via PPI
by clueless newbie (Curate) on Jul 08, 2023 at 01:20 UTC

    Tried recursively processing blocks via regex --- but that was/is beyond me! Felt so good when I quit banging my head against the wall!

    Converted it to use PPI and got this:

    use 5.032001; use warnings; #use Data::Dumper; use PPI; use PPR::X; #use PPI::Dumper; use Scalar::Util qw(blessed); my $PTR; my @PACKAGE=('main','main'); # Just a useful routine during delousing sub as_header { my $string=(' > ' x $PTR).ref shift; return $string.('='x (64-length $string)); }; sub process_chunk { $PACKAGE[++$PTR]=undef; for my $node (shift->children) { next # This should never happen! unless (blessed $node); next # WE don't care about these do we?? if ($node->isa('PPI::Statement::Null') || $node->isa('PPI: +:Token::Comment') || $node->isa('PPI::Token::Whitespace') || $node->isa('PPI +::Token::Structure')); #say as_header($node); if ($node->isa('PPI::Structure::Block')) { #say $node->content; } elsif ($node->isa('PPI::Statement::Package')) { #say $node->content; #say $node->child(2)->content; $PACKAGE[$PTR]=$node->child(2)->content; } elsif ($node->isa('PPI::Statement::Sub')) { #say $node->content; my $name=$node->child(2)->content; unless ($name =~ m{::}) { for (my $i=$PTR; $i >= 0; $i--) { if (defined $PACKAGE[$i]) { $name=qq{$PACKAGE[$i]::$name};#' last; }; }; }; say $name; } else { #PPI::Dumper->new($node,whitespace=>0)->print; }; # See if further processing is possible process_chunk($node) if (blessed $node && $node->can('children')); } $PTR--; }; # sub process_chunk: my $chunk=do { local $/; <DATA> }; # Let's let Conway confirm all is good to go! unless ($chunk =~ m{ (?&PerlEntireDocument) $PPR::X::GRAMMAR }x) { die "Invalid Perl code: " . $PPR::X::ERROR->source . "\n" } else { # Now go process the bloody thing! process_chunk(PPI::Document->new(\$chunk)); }; __DATA__ sub zero { # main::zero say __PACKAGE__.q{::zero}; }; package Fee; sub one { # Fee::one say __PACKAGE__.q{::one}; }; package Fi; sub two { # Fi::two say __PACKAGE__.q{::two}; }; { package Fo; sub three { # Foo::Three say __PACKAGE__.q{::three}; }; }; # End of package Foo! {; say "This block introduces no new package" }; sub Fum::four { # Fum::four say __PACKAGE__.q{:four}; }; sub four { # Fi::four say __PACKAGE__.q{:four}; }; package I_smell_the_blood_of_an_Englishman { # Package with a bloc +k sub five { # I_smell_the_blood_of_an_Englishman::five say qq{Fee-fi-fo-fum,\nI smell the blood of an Englishman, +\nBe he alive, or be he dead\nI'll grind his bones to make my bread.} +; }; };

    which produces a lovely

    main::zero Fee::one Fi::two Fo::three Fum::four Fi::four I_smell_the_blood_of_an_Englishman::five

    I'm sure that using Module::Info as per AM and karl would have been much quicker but while the whipping would have been less so would the joy!

    In an effort to stop the "flogging" I have (making use of PPI)

    #!/usr/bin/env perl # To paraphrase Voltaire who wrote # "Mais dans ce pays-ci il est bon de tuer de tems en tems un Amiral +pour encourager les autres." # we have in its place # "But in this company it is good to fire a programmer from time to t +ime to encourage the others." ## no critic (ProhibitPostfixControls,ProhibitUnlessBlocks) ## no critic (RequireExtendedFormatting,RequireDotMatchAnything,Requir +eLineBoundaryMatching) ## no critic (RequirePodSections,RequireCheckedSyscalls); use 5.032001; use warnings; use Data::Dumper; use English qw(-no_match_vars); {; # The useful stuff in in this block!! use Params::Validate qw(:all); use Path::Tiny; use PPI; use PPR::X; use Scalar::Util qw(blessed); my $contents; my $verbose; my ($PTR,@PACKAGE)=(0,'main','main'); my $subs; sub _subs { $PACKAGE[++$PTR]=undef; for my $node (shift->children) { next # This should never happen! unless (blessed $node); next # WE don't care about these do we?? if ($node->isa('PPI::Statement::Null') || $nod +e->isa('PPI::Token::Comment') || $node->isa('PPI::Token::Whitespace') || $no +de->isa('PPI::Token::Structure')); printf "%-40s: %s\n",(' > ' x $PTR).$node->class,( +' > ' x $PTR).substr($node->content =~ s{ (?&PerlNWS) $PPR::X::GRAMMA +R }{ }gxr,0,32) if ($verbose); if ($node->isa('PPI::Statement::Package')) { #say $node->content; $PACKAGE[$PTR]=$node->child(2)->content; } elsif ($node->isa('PPI::Statement::Sub')) { #say $node->content; my $name=$node->child(2)->content; if ($name =~ m{^(.*?)::(\w+)$}) { # name has p +ackage #;warn Data::Dumper->new([\$1,\$2],[qw(*1 +*2)])->Deepcopy(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; # The following fails on perl-5.32.1_64 #$subs->{''.$1}{''.$}}=$node->content =~ s +{ (?&PerlNWS) $PPR::X::GRAMMAR }{ }gxr; # ... so #;warn Data::Dumper->new([\$1,\$2],[qw(*1 +*2)])->Deepcopy(1)->Indent(1)->Maxdepth(3)->Sortkeys(1)->Dump(),q{ }; my ($package,$sub)=($1,$2); $subs->{$package}{$sub}=( $contents ? $node->content =~ s{ (?&Pe +rlNWS) $PPR::X::GRAMMAR }{ }gxr : (@{$node->location})[0] ); } else { for (my $i=$PTR; $i >= 0; $i--) { # Look f +or the containing package if (defined $PACKAGE[$i]) { $subs->{$PACKAGE[$i]}{$name}=( $contents ? $node->content =~ +s{ (?&PerlNWS) $PPR::X::GRAMMAR }{ }gxr : (@{$node->location})[0] ); last; }; }; }; }; # See if further processing is needed --- ie if th +is node leads to a sub declaration _subs($node) if ($node->can('find') && $node->find('PPI::St +atement::Sub')); } $PTR--; return; }; # sub _subs: sub subs { Params::Validate::validate_pos(@_,{ type=>SCALAR|SCALARREF },( +{}) x $#_); my $arg=shift; my (%opt_H)=Params::Validate::validate(@_,{ contents=>{ type=> +BOOLEAN,default=>0 },verbose=>{ type=>BOOLEAN,default=>0 } }); $contents=$opt_H{contents}; $verbose=$opt_H{verbose}; if (ref $arg eq 'SCALAR') { # Reference to a string unless (${$arg} =~ m{ (?&PerlEntireDocument) $PPR::X::GRAM +MAR }x) { die q{Invalid Perl code: }.$PPR::X::ERROR->source."\n" }; } else { # A Scalar -- hence a file's name unless (path($arg)->slurp =~ m{ (?&PerlEntireDocument) $PP +R::X::GRAMMAR }x) { die q{Invalid Perl code: }.$PPR::X::ERROR->source."\n" }; }; _subs(PPI::Document->new($arg)); return $subs; }; # sub subs: }; my $chunk=do { local $INPUT_RECORD_SEPARATOR=undef; <DATA> }; my $sub_href=subs(\$chunk,contents=>1,verbose=>1); warn Data::Dumper->new([\$sub_href],[qw(*sub_href)])->Deepcopy(1)->Ind +ent(1)->Maxdepth(3)->Sortkeys(1)->Dump().q{ }; exit; __DATA__ sub zero { # main::zero say __PACKAGE__.q{::zero}; }; package Fee; sub one { # Fee::one say __PACKAGE__.q{::one}; }; package Fi; sub two { # Fi::two say __PACKAGE__.q{::two}; }; { package Fo; sub three { # Foo::Three say __PACKAGE__.q{::three}; }; }; # End of package Foo! {; say "This block introduces no new package" }; sub Fum::four { # Fum::four say __PACKAGE__.q{:four}; }; sub four { # Fi::four say __PACKAGE__.q{:four}; }; package I_smell_the_blood_of_an_Englishman { # Package with a bloc +k sub five { # I_smell_the_blood_of_an_Englishman::five say qq{Fee-fi-fo-fum,\nI smell the blood of an Englishman, +\nBe he alive, or be he dead\nI'll grind his bones to make my bread.} +; }; };

    which turns up (when contents and verbose are enabled)

    > PPI::Statement::Sub : > sub zero { say __PACKAGE_ +_.q{::z > PPI::Statement::Package : > package Fee; > PPI::Statement::Sub : > sub one { say __PACKAGE__ +.q{::on > PPI::Statement::Package : > package Fi; > PPI::Statement::Sub : > sub two { say __PACKAGE__ +.q{::tw > PPI::Statement::Compound : > { package Fo; sub three { + say __ > > PPI::Structure::Block : > > { package Fo; sub thre +e { say __ > > > PPI::Statement::Package : > > > package Fo; > > > PPI::Statement::Sub : > > > sub three { say __P +ACKAGE__.q{:: > PPI::Statement : > {; say "This block introd +uces no > PPI::Statement::Sub : > sub Fum::four { say __PAC +KAGE__. > PPI::Statement::Sub : > sub four { say __PACKAGE_ +_.q{:fo > PPI::Statement::Package : > package I_smell_the_blood +_of_an_ > > PPI::Token::Word : > > package > > PPI::Token::Word : > > I_smell_the_blood_of_a +n_Englishm > > PPI::Structure::Block : > > { sub five { say qq{Fe +e-fi-fo-fu > > > PPI::Statement::Sub : > > > sub five { say qq{F +ee-fi-fo-fum, $sub_href = \{ 'Fee' => { 'one' => 'sub one { say __PACKAGE__.q{::one}; }' }, 'Fi' => { 'four' => 'sub four { say __PACKAGE__.q{:four}; }', 'two' => 'sub two { say __PACKAGE__.q{::two}; }' }, 'Fo' => { 'three' => 'sub three { say __PACKAGE__.q{::three}; }' }, 'Fum' => { 'four' => 'sub Fum::four { say __PACKAGE__.q{:four}; }' }, 'I_smell_the_blood_of_an_Englishman' => { 'five' => 'sub five { say qq{Fee-fi-fo-fum,\\nI smell the blood +of an Englishman,\\nBe he alive, or be he dead\\nI\'ll grind his bone +s to make my bread.}; }' }, 'main' => { 'zero' => 'sub zero { say __PACKAGE__.q{::zero}; }' } }; at PPI_03.t line 104, <DATA> line 1.