Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to make substitutions within a block of text using the following:
#!/usr/bin/perl -w use strict; my %subs = ( qr/(\d+)F/i => sub { warn "Matched F with $1\n"; return sprintf("%.u" , ($1 - 32) / 1.8) . 'C' }, qr/(\d+)C/i => sub { warn "Matched C with $1\n"; return sprintf("%.u" , ($1 * 1.8) + 32) . 'F' }, ); my $html = do { local $/; <DATA> }; warn $html; foreach my $key (keys %subs) { warn "checking key: $key\n"; $html =~ s/$key/$subs{$key}->()/eg; warn $html; } warn $html; __DATA__ convert 180F to C convert 180C to F convert 30" to cm etc
If you look at the output of the code; you'll see that once converted, the next regex converts it back..
convert 180F to C convert 180C to F convert 30" to cm etc checking key: (?i-xsm:(\d+)C) Matched C with 180 convert 180F to C convert 356F to F convert 30" to cm etc checking key: (?i-xsm:(\d+)F) Matched F with 180 Matched F with 356 convert 82C to C convert 180C to F convert 30" to cm etc convert 82C to C convert 180C to F convert 30" to cm etc
I assume the solution is to use the global match modifier \G within a while loop but can't get my head around it - if possible I'd like to maintain a hash of qr's to make the code easier to maintain as I extend it. Hope someone can help point me in the right direction! Thanks

Replies are listed 'Best First'.
Re: Global replace issue
by moritz (Cardinal) on Aug 29, 2010 at 18:44 UTC
    Whatever you do, make only a single pass over the data. This can be done matching, and using the /gc modifiers to not reset pos, and doing the substitutions yourself.

    Another possibility is to do just a single substitution, and keep track of what was matched.

    #!/usr/bin/perl -w use strict; use warnings; my %subs = ( farenheit => sub { warn "Matched F with $^N\n"; return sprintf("%.u" , ($^N - 32) / 1.8) . 'C' }, celsius => sub { warn "Matched C with $^N\n"; return sprintf("%.u" , ($^N * 1.8) + 32) . 'F' }, ); my $WHAT; my $regex = qr/ (\d+)F (?{ $WHAT = 'farenheit' }) | (\d+)C (?{ $WHAT = 'celsius' }) /xi; my $html = do { local $/; <DATA> }; $html =~ s/$regex/$subs{$WHAT}->()/eg; warn $html; __DATA__ convert 180F to C convert 180C to F

    Note that this uses a feature that's marked as EXPERIMENTAL in perlre, so be warned.

    Also since there are now multiple captures in the same regex, $^N is more robust than using $1, $2 etc.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Global replace issue
by FunkyMonk (Bishop) on Aug 29, 2010 at 18:39 UTC
    I'd use something like...
    foreach my $key (keys %subs) { warn "checking key: $key\n"; if ($html =~ s/$key/$subs{$key}->()/eg) { warn $html; last; } }

    So that you leave the loop once you've made a substitution.

Re: Global replace issue
by johngg (Canon) on Aug 29, 2010 at 22:23 UTC

    Just a minor observation regarding your use of sprintf and concatenation. You can dispense with the concatenation entirely by just including the concatenated text in the format string. E.g. replace

    return sprintf("%.u" , ($1 - 32) / 1.8) . 'C'

    with

    return sprintf q{%.uC}, ( $1 - 32 ) / 1.8

    Also, I'm not sure what you intend with the "%.u" format specifier, which will truncate your result. Perhaps something like "%.1f" would be more useful?

    $ perl -E 'say sprintf q{%.uC}, ( 75 - 32 ) / 1.8;' 23C $ perl -E 'say sprintf q{%fC}, ( 75 - 32 ) / 1.8;' 23.888889C $ perl -E 'say sprintf q{%.1fC}, ( 75 - 32 ) / 1.8;' 23.9C $

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Global replace issue
by mr_mischief (Monsignor) on Aug 30, 2010 at 11:03 UTC

    You're iterating over all the keys, and you're not happy that the subs associated with all of them get called. The simple solution here is to not iterate over all the keys.

    You are finding it not very useful to convert units and convert them back. I'm wondering why it would be useful to convert in whichever direction in the first place instead of ending up with all metric or all Imperial. Maybe appending the conversion to the existing version would be handy, but going from a randomly mixed bag of units to having exactly the opposite measurements in the wrong units doesn't seem to be.

    Perhaps you should have a metric-to-Imperial hash and an Imperial-to-metric hash, and tell the program from the command line or such which units you want to have in the end. That way you'd only convert in one direction, and only the measurements that started in the wrong units.

      i think mischief is onto something. when you find yourself going slightly loopy in complex code, it's often an indication to refactor design. as a lot of time it's not super ingenious code that "wins"; simpler code with a simple interface is often more flexible in the long run.
      processing the input to produce a internal representation in ISO units or other consistent units, allows good flexibility.
      if you're interested following that path, have a look (and infer the logic) on unit conversion apps such as those for iphone.
      the hardest line to type correctly is: stty erase ^H
Re: Global replace issue
by aquarium (Curate) on Aug 29, 2010 at 23:55 UTC
    instead of using subs you could do the calculations in the regex itself. and to keep things sane have one regex for C and another for F. you could have a regex with calc for both C and F combined but that would be quite ugly.
    the hardest line to type correctly is: stty erase ^H