Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

s///g within m//g regions

by almr (Sexton)
on Nov 09, 2021 at 12:55 UTC ( [id://11138609]=perlquestion: print w/replies, xml ) Need Help??

almr has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

what's an idiomatic way to match certain regions of a string (m//mg), and then for each sub-region, perform substitutions (s///mg)? I can think of several options, none of which seems appealing.

E.g. within regions delimited by "# START" / "# END", uncomment lines (s/^[ #]*//g). I could turn the string into a line-list, use the flip-flop to identify regions, and s///g for each line. But isn't there something cleaner and more general?

Replies are listed 'Best First'.
Re: s///g within m//g regions
by haukex (Archbishop) on Nov 09, 2021 at 14:22 UTC

    There are quite a few techniques you can use to make your regexen nicer, any combination of which might be useful to you, see for example /x, /e, /r, \K, and lookarounds in perlre. In the following example I've thrown all of them together into one, which may be overkill. (I see ikegami has posted something similar, albeit non-functional, while I was composing this.)

    my $data = <<'END'; foo123bar # foo123bar foo456bar # xyz foo789bar abc foo456bar END $data =~ s{ ^ # beginning of line \h* \# \h* # comment lines \K # keep everything up to here in replacement (?<comment> \N*) # capture the comment $ # end of line } { handle_foobar( $+{comment} ) }msxge; sub handle_foobar { return shift =~ s{ foo \K (\d+) (?= bar ) }{ $1 =~ tr/0-9/a-j/r }msxger; }
    I could turn the string into a line-list

    Note it's also possible to open a string as an in-memory filehandle:

    my $str = <<'END'; Hello # START World # END Aaa # START Bbb # END Ccc END open my $fh, '<', \$str or die $!; while (<$fh>) { chomp; if ( /^# START/ .. /^# END/ ) { print "<$_>\n"; } } close $fh;

    Also, a more advanced regex technique is m/\G.../gc parsing, which is described in perlop. Also, as for your example here, if the delimiters need to be escaped, see Regexp::Common::delimited.

      Thanks for inventorying all these techniques. The basic thing I was forgetting was the /e modifier, combined with using s{}{} as a delimiter. This is a really powerful combination!
Re: s///g within m//g regions
by ikegami (Patriarch) on Nov 09, 2021 at 13:48 UTC
    s{ ( ^ \#[ ]START \n ) ( .*? ) ( ^ \#[ ]END \n ) }{ $1 . ( $2 =~ s/^[ $]*//rg ) . $3 }xsmeg;

      Yes! Except the above breaks, because the inner s/// kills the existing capture groups. So, those must be saved. E.g.

      s{ ( ^ \#[ ]START \n ) ( .*? ) ( ^ \#[ ]END \n ) }{ my @m = @{^CAPTURE}; $m[ 0 ] . ( $m[ 1 ] =~ s/^[ #]+//rmg ) . $m[ 2 ]; }xmesg;

        @{^CAPTURE} is new to me!

        (hmm, I wonder if String::Substitution can be optimized on version where this is available.)

Re: s///g within m//g regions
by tybalt89 (Monsignor) on Nov 09, 2021 at 14:16 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11138609 use warnings; my $have = <<END; # turn * into - within |...| regions A |simple*example| is |so*simple| to come*up |with these*days| END my $want = $have =~ s{\|[^|]*\|}{ $& =~ tr/*/-/r }ger; print "$have\n$want";

    Outputs:

    # turn * into - within |...| regions A |simple*example| is |so*simple| to come*up |with these*days| # turn * into - within |...| regions A |simple-example| is |so-simple| to come*up |with these-days|
Re: s///g within m//g regions
by LanX (Saint) on Nov 09, 2021 at 13:14 UTC
    > I could turn the string into a line-list, use the flip-flop to identify regions, and s///g for each line.

    That's exactly what I would do. 3 lines max.

    > But isn't there something cleaner and more general?

    I beg your pardon? Plz define "cleaner"

    You can certainly use some convoluted huge nested regex for this, which is hardly better maintainable.

    (I'm sure tybalt is already preparing one)

    Or an extra module with some proprietary parsing grammar. But that's only replacing one DSL with another.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Well, something like iterating over matches (for my $m( $str =~ m/^# START.*?^# END.*/mg ) { update_match( $m ); }.

      In this case, yes, I could reshape into a line-list, but what about other "region" definitions, and possibly multiple regions in a line?

        > but what about other "region" definitions, and possibly multiple regions in a line?

        SSCCE please :)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11138609]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-03-28 13:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found