Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

How do I avoid double substitution when replacing many patterns?

by kyle (Abbot)
on Jan 20, 2007 at 16:39 UTC ( [id://595683]=perlquestion: print w/replies, xml ) Need Help??

kyle has asked for the wisdom of the Perl Monks concerning the following question:

I was looking through Class::Phrasebook recently. Among its features is a miniature template system. You can give it data like this:

$phrase = 'Hello $dolly!'; $variables = { dolly => 'Nurse' };

...and that will be turned into "Hello Nurse!" The code to do this job looks like this:

$phrase =~ s/\$([a-zA-Z0-9_]+)/$variables->{$1}/g; # also process variables in $(var_name) format. $phrase =~ s/\$\(([a-zA-Z0-9_]+)\)/$variables->{$1}/g;

That's fine until someone does something like:

$phrase = '$foo $bar'; $variables = { foo => '$(bar)', bar => '$(foo)', };

When that happens, the first replacement above changes the phrase to "$(bar) $(foo)" (the correct result), and then the second replacement turns it into "$(foo) $(bar)" (wrong).

To be fair, this is a contrived example, and real world examples of this problem are few and far between. Nevertheless, when it does happen, it may be a real pain to debug.

It got me thinking about a bulletproof way to do this kind of interpolation, and I eventually came up with this:

my @phrase_parts = split /(\$(?:\(\w+\)|\w+))/, $phrase; foreach my $part ( @phrase_parts ) { $part =~ s{ \$ (\w+) }{$variables->{$1}}xms || $part =~ s{ \$ \( (\w+) \) }{$variables->{$1}}xms; } $phrase = join '', @phrase_parts;

I'm using the fact that split will include its delimiters in its result when the pattern you give it is wrapped in capturing parentheses. Everything that looks like a variable is an isolated element in @phrase_parts. Each one is subjected to replacement only once, so their replacements can't interfere with each other.

Now I'm wondering if there's an even better way. The only other thought I had was to use Template Toolkit, but that seemed like a much larger hammer than necessary. I'd be interested to hear thoughts about this from the monks.

Replies are listed 'Best First'.
Re: How do I avoid double substitution when replacing many patterns?
by dirving (Friar) on Jan 20, 2007 at 17:29 UTC

    Why not just combine the two into one regexp, skirting the issue entirely?

    $phrase = '$foo $(bar)'; $variables = { foo => '$(bar)', bar => '$(foo)' }; $phrase =~ s/ (?: # Just take the first one \$([a-zA-Z0-9_]+) ) | # And alternate it with the second (?: \$\(([a-zA-Z0-9_]+)\) ) /$variables->{$1||$2}/xg; # Then do the substitution for the one # that matched print $phrase; # Prints "$(bar) $(foo)"

    I'm sure there's an easier and more efficient way to write this regex, but this seems like the straightforward solution. You eliminate the second pass, so you eliminate the double interpolation. This particular scheme fails if you allow a variable named '0' though, so you may need to do something else in the replacement part if this is the case.

    -- David Irving

      The leading $and the middle \w are common between patterns. In fact, all that's required is to require a closing ) only when there was an opening (.

      Using a less well known feature

      \$ # Our '$' prefix (?: # Optionally *don't* find the opening paren. | (\() # Optionally find the opening paren ) (\w+) # The middle part is captured in $2 (?(1)\)) # Require a closing paren only if $1 matched.

      Merely removing the prefix

      \$ # Our '$' prefix ( # Capture into $1 (?: \w+ # Plain word. | \( \w+ \) # A word with parentheses around it. ) )

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        You could change
        (?: | ( \( ) )
        to
        ( \( )?
        ... unless you get queasy when you see quantifiers placed on capturing groups.

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Replace $1 ||$2 with $+. That'll contain the value of the last capture that actually matched. See perlvar:
      The text matched by the last bracket of the last successful search pattern. This is useful if you don't know which one of a set of alternative patterns matched.

      p.s. Originally I mistakingly had posted this as a followup to Re^2: How do I avoid double substitution when replacing many patterns?, now its sibling.

        PS, I didn't know about $+ so I'm glad you posted that in the wrong place.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      This particular scheme fails if you allow a variable named '0' though, so you may need to do something else in the replacement part if this is the case.

      Yes. In that case, your "$1||$2" would have to be "defined $1 ? $1 : $2".

Re: How do I avoid double substitution when replacing many patterns?
by davidrw (Prior) on Jan 20, 2007 at 17:10 UTC
    don't have a better way, but this would be a good thing (if you're not planning to already) to post as a failing test case to the module's RT, with either the patch you described or maybe just a reference to this thread (especially depending on the replies)..

    A good basic double-check (obviously not complete, since you found this issue in the first place) would be to hack Class::Phrasebook w/your solution and make sure it still passes the test suite (and the previously failing foo/bar test)...
Re: How do I avoid double substitution when replacing many patterns?
by Rhandom (Curate) on Jan 20, 2007 at 22:48 UTC
    Using an even lesser known regex item...
    perl -e ' $f = "\$foo \$(bar) \n"; $f =~ s/\$ (?: (\w+) | \((\w+)\) )/<$^N>/xg; print $f;'

    The nice variable $^N will return the value from the last matching group. When used with other less known constructs you can do some really neat things.

    The following also works - but will allow for more matching groups after the $^N.

    perl -e ' my $f = "\$foo \$(bar)"; our $val; # must use package global # for temporization in regex $f =~ s{\$ # the dollar (?: # outer altinator (\w+) (?{ $val = $^N }) # match and then store | \((\w+) (?{ $val = $^N }) \) # or match 2 and store ) # close outer }{<$val>}xg; print "$f\n"'

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: How do I avoid double substitution when replacing many patterns?
by Moron (Curate) on Jan 22, 2007 at 13:13 UTC
    Given that you don't say in practice what nut you are trying to crack, it is hard to judge whether Template::Toolkit is necessary or not. Although what is being achieved by Class::Phrasebook seems to me on the other hand to be a bit underwhelming by comparison. One could argue that rummaging around a McDonalds kitchen is unlikely to get you a portion of foie gras de canard.

    However, Template::Toolkit is often hard to get started with and it is indeed the need for its sheer power that tends to justify the effort.

    -M

    Free your mind

Re: How do I avoid double substitution when replacing many patterns?
by barbie (Deacon) on Jan 26, 2007 at 15:55 UTC

    You might want to take a look at Data::Phrasebook, as this handles things a little better. Using your example the following works:

    use Data::Phrasebook; my $pb = Data::Phrasebook->new( file => 'phrases.txt' ); my $str = $pb->fetch( 'baz', {foo => '${bar}', bar => '${foo}'} ); # $str = 'foo is ${bar} and bar is ${foo}'

    where phrases.txt use the default parameter substitution and looks like:

    baz=foo is :foo and bar is :bar

    While there is support for TT style parameter substitution, it doesn't current support a TT embeded templating system. Might be a possibility for the future, but would be a little overkill for this problem ;)

    --
    Barbie | Birmingham Perl Mongers user group | http://birmingham.pm.org/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://595683]
Approved by Corion
Front-paged by andyford
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-20 00:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found