cavac has asked for the wisdom of the Perl Monks concerning the following question:

In 5.36 sub signatures are finally standard, so i started converting some of my code to use them. A lot of code, actually. So i've written a script that helps me with the conversion process.

It's quite tied to my own coding style and it won't spare you the work of handling optional arguments. And you need to test your software after the conversion. A lot. But it still saves a lot of manual editing.

Posting this mostly to ask on how you would have done this differently or how it could be improved. Don't spare the criticism (i can take it), i know this is substandard code done in a hurry.

Code in readmore tags, because of length.

#!/usr/bin/env perl #---AUTOPRAGMASTART--- use v5.36; use strict; use diagnostics; use mro 'c3'; use English; use Carp qw[carp croak confess cluck longmess shortmess]; our $VERSION = 4.1; use autodie qw( close ); use Array::Contains; use utf8; use Data::Dumper; use builtin qw[true false is_bool]; no warnings qw(experimental::builtin); use PageCamel::Helpers::UTF; #---AUTOPRAGMAEND--- # PAGECAMEL (C) 2008-2022 Rene Schickbauer # Developed under Artistic license print "Searching files...\n"; my @files = (find_pm('devscripts'), find_pm('lib')); #@files = ('lib/PageCamel/Helpers/PostgresDB.pm'); die('optionalargs.txt already exists!') if(-f 'optionalargs.txt'); open(my $optfh, '>', 'optionalargs.txt') or die($ERRNO); my %knownoptionals; print "Changing files:\n"; foreach my $file (@files) { my $inserted = 0; print "Editing $file...\n"; my @lines; open(my $ifh, "<", $file) or die($ERRNO); @lines = <$ifh>; close $ifh; #open(my $ofh, ">", 'bla') or die($ERRNO); open(my $ofh, ">", $file) or die($ERRNO); while(scalar @lines) { my $line = shift @lines; my ($subname, $args); # Match the sub NAME line if($line =~ /[^\#]*\ *sub\ (.*)\ \{/) { $subname = $1; } else { print $ofh $line; next; } if($lines[0] =~ /[^\#]*my\ .*?\((.*)\).*\@\_/) { # Match arguments in the form of: my ($foo, $bar) = @_; $args = $1; } elsif($lines[0] =~ /[^\#]*my\ \(*(.*?)\)*\ *\=\ *shift(\ *\@ +\_)*/) { # Match single "shift" argument: my $self = shift @_; $args = $1; print "-------------- SHIFT ARG!!! ---------------\n"; print " $args\n"; } else { print "#### Sub $subname has no args\n"; print $ofh $line; next; } $subname =~ s/^\ +//g; $subname =~ s/\ +$//g; $args =~ s/^\ +//g; $args =~ s/\ +$//g; my $temp = shift @lines; my $newsub = 'sub ' . $subname . '(' . $args . ') {' . "\n"; print $newsub; print $ofh $newsub; # Try to warn about (possible) optional arguments. This isn't +perfect, but better than nothing lookForOptionals($file, $subname, $args, @lines); } close $ofh; } close $optfh; print "Done.\n"; exit(0); sub lookForOptionals($file, $subname, $arglist, @lines) { # read all the lines from the current sub and beautify the argumen +ts my @sublines = getSublines(@lines); my @args = getArgs($arglist); # Let's look if any of the arguments are used in a defined() match foreach my $arg (@args) { foreach my $line (@sublines) { my $matcharg = 'defined\(\ *\\' . $arg . '\ *\)'; if($line =~ /$matcharg/) { my $key = join('___', $file, $subname, $arg); if(!defined($knownoptionals{$key})) { $knownoptionals{$key} = 1; print "$file / $subname: Optional argument $arg\n" +; print $optfh "$file / $subname: Optional argument +$arg\n"; } } } } return; } sub getSublines(@lines) { # Read all lines of the sub. We count opening and closing braces { +}, when # we reach zero we are at the end of the sub. Mostly, maybe, unles +s we aren't. # This is a very dumb algorithm. You can't write a proper code ana +lyzer in five # minutes. # # To quote my favourite robot character: # "I calculated the odds of this succeeding against the odds I was + doing something incredibly stupid... and I went ahead anyway." my $count = 1; my @sublines; for(my $i = 0; $i < scalar @lines; $i++) { my $line = $lines[$i]; push @sublines, $line; $count += getBraceCount($line); last unless($count); } return @sublines; } sub getBraceCount($line) { my $count = 0; my @parts = split//, $line; foreach my $part (@parts) { if($part eq '{') { $count++; } elsif($part eq '}') { $count--; } } return $count; } sub getArgs($arglist) { my @args = split/\,/, $arglist; for(my $i = 0; $i < scalar @args; $i++) { $args[$i] =~ s/^\ +//; $args[$i] =~ s/\ +$//; } return @args; } sub find_pm($workDir) { my @files; opendir(my $dfh, $workDir) or die($ERRNO); while((my $fname = readdir($dfh))) { next if($fname eq "." || $fname eq ".." || $fname eq ".hg"); $fname = $workDir . "/" . $fname; if(-d $fname) { push @files, find_pm($fname); } elsif($fname =~ /\.p[lm]$/i && -f $fname) { push @files, $fname; } } closedir($dfh); return @files; }
PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP

Replies are listed 'Best First'.
Re: Converting to sub signatures
by hv (Prior) on Jun 23, 2022 at 16:10 UTC

    My first concern would be with regexps like:

            if($line =~ /[^\#]*\ *sub\ (.*)\ \{/) {

    It looks like the intent is to match only outside of comments, but because the pattern is not anchored the initial /[^\#]*/ is effectively a no-op. And because both elements of [^\#]*\ * can be zero-length, this would match eg "for my $sub (@callbacks) {" - I'd definitely want at least a mandatory space (or beginning-of-line).

    For the rest, I would find the regexps easier to read if they were expanded with //x (at the cost of having to write \s+ frequently). But it's hard to comment on code that is "quite tied to (your) own coding style" without details of what that is. For example I'll tend to break long lines in particular ways that would not be hard to parse in a line-by-line manner, but in the general case that would be hard to do without a complex state machine.

      You are absolutely right about the regular expressions not working all that well. It did in fact fumble some commented out functions into non-working code.

      I very seldomly use negative matches ("must not contain" stuff), usually i check first if the line is a comment, and THEN decide if a want to process further. How would i correctly anchor this?

      Example code lines:

      should match these: sub hello { sub hello { should not match these: #sub hello { # sub hello { # sub hello { # sub hello {

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP

        G'day cavac,

        This does/doesn't match as per your example:

        qr{^\s*[^#]*\s*sub}

        Quick test:

        #!/usr/bin/env perl use v5.26; my $re = qr{^\s*[^#]*\s*sub}; my (@match, @no_match); while (<DATA>) { if (/$re/) { push @match, $_; } else { push @no_match, $_; } } say '*** Matched:'; print for @match; say '*** Not Matched:'; print for @no_match; __DATA__ should match these: sub hello { sub hello { should not match these: #sub hello { # sub hello { # sub hello { # sub hello {

        Output:

        *** Matched: sub hello { sub hello { *** Not Matched: should match these: should not match these: #sub hello { # sub hello { # sub hello { # sub hello {

        — Ken

        I guess the next question is: are there any examples you want to match that do not start m{ ^ \s* sub \b }x? If not, then you have your answer; if there are, those are the tricky cases we'd need to see.

        I just took a quick look at one of my codebases and found 558 matches for that pattern; 503 of them were normal sub declarations all matching m{ ^ \s* sub \s+ (\w+) \s* \{ }x, the rest were anonymous sub references not matching that pattern. Of matches against m{ \b sub \b }x elsewhere in any line, almost all were anonymous sub references (the remaining 2 or 3 were comments or $sub variables).

        So that pattern would work for me, would it work for you?

Re: Converting to sub signatures -- PPI
by Discipulus (Canon) on Jun 24, 2022 at 07:07 UTC
    Hello cavac,

    this sounds very fun! For a zillion of XPD I'd write for you something better :)

    No, seriously: parsing perl with regexes? "Only perl can parse Perl" see also On Parsing Perl

    Above statements and link comes from PPI documentation.. and yes PPI works!

    You can give a try to one of most underestimated CUFP of mines: Repeatedly edit a file hacking PPI::Cache and use it to parse PPI::Statement::Sub class in a perl document. Let see it at work:

    perl ppi-editor.pl cache file last_hex_id_file.sto not found. # it search for p +revious runs in the hardcoded filename last_hex_id_file.sto Insert the full path of a new file to edit and press enter (or CTRL-C +to terminate) FILE PATH:./Exp.pm # <---- loading a + file loading from cache ok 'last_hex_id_file.sto' succesfully read: using 5a459de7e28b9b423d7c8d5 +ba988fbdb Which PPI class do you want to edit? PPI CLASS:PPI::Statement::Sub # <---- selecting + a PPI class to work on Each element of the type PPI::Statement::Sub will be proposed for edit + (the content). insert your new input terminating it with CTRL-Z on a empty line. use a bare ENTER to skip the current element STATEMENT: sub new{ my $class = shift; my $validated = _validate( $_[0] ); return bless { validated => $validated, stored => [], },$class; } CONTENT: sub new{ my $class = shift; my $validated = _validate( $_[0] ); return bless { validated => $validated, stored => [], },$class; } EDIT: .... # <---- and so on for every su +b statement storing cache hex_id: 5a459de7e28b9b423d7c8d5ba988fbdb in last_hex_id_ +file.sto Enter a filename if you want to save the current version (or ENTER to +skip) OUTPUT FILE:

    You can use the above code as skeleton to automatically inspect every sub and then inspect each shift or assignement coming from @_ to change the sub adding signatures. Should be fun!

    I suggest you a final diff log just to review what happened to your code: "unexpected is always a possiblity" :)

    L*

    UPDATE See PPI::Element to see the difference between statement and content ..in effect i forgot it :) but relaunching my program to inspect PPI::Token::Quote class will demonstrate the difference well enough:

    perl ppi-editor.pl 'last_hex_id_file.sto' succesfully read: using 5a459de7e28b9b423d7c8d5 +ba988fbdb # it wants to reload previous edited file Which PPI class do you want to edit? + # if it finds last_hex_id_file.sto PPI CLASS:PPI::Token::Quote + # <-- search for Quote Each element of the type PPI::Token::Quote will be proposed for edit ( +the content). insert your new input terminating it with CTRL-Z on a empty line. use a bare ENTER to skip the current element STATEMENT: croak "No arg!" unless $_[0]; + # statement including the PPI::Element CONTENT: "No arg!" + # content of the PPI::Element itself EDIT:"No arguments!" + # a minimal change EDIT:^Z storing cache hex_id: 057d482fed857c338ec5ff25f2312900 in last_hex_id_ +file.sto Enter a filename if you want to save the current version (or ENTER to +skip) OUTPUT FILE:./Exp-bis.pm + # save to a new file to play nicely.. C>diff Exp.pm Exp-bis.pm + # check it out! 17c17 < croak "No arg!" unless $_[0]; --- > croak "No arguments!" unless $_[0];

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.