ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm have a little regex that I'm using to search and replace some content in a string:

my $test = qq|Some test. This is a test-with a dash. We want to match +test here|; $test =~ s{\b(test)\b}{change_for_url($1)}eg; sub change_for_url { print "GOT: '$_[0]'\n"; }
If I run this, I get:
GOT: 'test' GOT: 'test' GOT: 'test'


The problem is, I *don't* want to match "test-with" in my example. How do I go about ignoring parts like that? A real world example in the script I'm working with, is:

Régional Loire-Anjou-Touraine

I want to replace "loire" with a new string - but ONLY if its not part of Loire-Anjou-Touraine

Please let me know if you need more clarification :)

Thanks

Andy

Replies are listed 'Best First'.
Re: Regex boundary match (updated demo code)
by LanX (Saint) on Feb 08, 2020 at 14:00 UTC
    Did you try a negative lookahead assertion to exclude the dash ?

    update

    DB<21> $test = qq|Some test. This is a test-with a dash. We want to m +atch test here| DB<22> x $test =~ /\b(test)(?!-)\b/g 0 'test' 1 'test' DB<23> print pos($test),":$1\n" while $test =~ /\b(test)(?!-)\b/g 9:test 60:test

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      Thanks. I ended up with ([^\-]). Is your version quicker? Or negligible difference? (its going to be run quite a lot of times)
        I ended up with ([^\-]). ... difference?

        It depends on exactly what you want.  ([^\-]) requires a character match and consumes a character. As LanX pointed out,  ([^\-]|$) consumes a character if it can. Neither seem to do what you want:

        c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'test atest atested tested test- -test test .test test. test' +; ;; $s =~ s/\b(test)([^-])/DONE/g; print qq{'$s'}; " 'DONEatest atested DONEd test- -DONEDONE.DONEDONE test' c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'test atest atested tested test- -test test .test test. test' +; ;; $s =~ s/\b(test)([^-]|$)/DONE/g; print qq{'$s'}; " 'DONEatest atested DONEd test- -DONEDONE.DONEDONE DONE'
        Maybe this does:
        c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'test atest atested tested test- -test test .test test. test' +; ;; $s =~ s/\b(test)(?!-)\b/DONE/g; print qq{'$s'}; " 'DONE atest atested tested test- -DONE DONE .DONE DONE. DONE'

        Update: Just noticed from the timestamps that ultranerds already decided to go with LanX's suggestion. :)


        Give a man a fish:  <%-{-{-{-<

Re: Regex boundary match
by AnomalousMonk (Archbishop) on Feb 08, 2020 at 21:50 UTC

    NB: If the word | substring translations are essentially one-to-one, you can use the technique discussed in haukex's Building Regex Alternations Dynamically article to do a fairly fast search/replace:

    c:\@Work\Perl\monks>perl use strict; use warnings; use Test::More 'no_plan'; use Test::NoWarnings; my %replace = qw( milk white toast brown cheese yellow peas green ); my ($rx_search) = map qr{ (?i) (?<! [-\w]) (?: $_) (?! [-\w]) }xms, join ' | ', map quotemeta, reverse sort keys %replace ; print $rx_search, "\n"; # for debug VECTOR: for my $ar_vector ( 'no changes in these', [ 'milky appease peasoup cheese-toast' => 'milky appease peasoup cheese-toast' ], 'parts of all these should change', [ 'mIlK, some PeAs, cheese and toast.' => 'white, some green, yellow and brown.' ], ) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } my ($string, $expected) = @$ar_vector; (my $replaced = $string) =~ s{ ($rx_search) }{$replace{lc $1}}xmsg +; is $replaced, $expected, "'$string' -> '$expected'"; } done_testing; exit; __END__ (?msx-i: (?i) (?<! [-\w]) (?: toast | peas | milk | cheese) (?! [-\w]) + ) # no changes in these ok 1 - 'milky appease peasoup cheese-toast' -> 'milky appease peasoup +cheese-toast' # parts of all these should change ok 2 - 'mIlK, some PeAs, cheese and toast.' -> 'white, some green, yel +low and brown.' 1..2 ok 3 - no warnings 1..3


    Give a man a fish:  <%-{-{-{-<

Re: Regex boundary match
by ultranerds (Hermit) on Feb 08, 2020 at 13:57 UTC
    I think I may have found a way to do it:

    $test =~ s{\b(test)\b([^\-]|$)}{change_for_url($1,$2)}eg;

    Can anyone see any possible issues with that?
      > Can anyone see any possible issues with that?

      Well, maybe not in this case, but this ([^\-]|$) will consume a position.

      This can become an issue if you start combining with following matches.

      That's why I recommended a "lookahead assertion" which is zero width.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        Cool thanks - I'll go with your suggestion then :) Much appreciated