in reply to Getting around "/" as a word boundary

The problem has nothing to do with word boundaries. The problem is that you're replacing your replacements. Fix:
my $pat = join '|', keys(%hashstore); $doc =~ s!\b($pat)\b!$1/$hashstore{uc($1)}!ig;

You're wrong about the negative lookbehind not working.

my %hashstore = ( "DEXX" => "AREX", "AREX" => "CUBE" ); my $doc1 = "DEXX"; my $doc2 = "DEXX"; my $doc3 = "DEXX"; #foreach (keys %hashstore){ foreach ("DEXX", "AREX"){ # Make sure we get them in the worse order. $doc1=~s#\b($_)\b#$1/$hashstore{uc($_)}#ig; } #foreach (keys %hashstore){ foreach ("DEXX", "AREX"){ # Make sure we get them in the worse order. $doc2=~s#(?<!/)\b($_)\b#$1/$hashstore{uc($_)}#ig; } my $pat = join '|', keys(%hashstore); $doc3 =~ s!\b($pat)\b!$1/$hashstore{uc($1)}!ig; print("$doc1\n"); # DEXX/AREX/CUBE print("$doc2\n"); # DEXX/AREX print("$doc3\n"); # DEXX/AREX

Note the addition of uc(). Without it, you'd match stuff you wouldn't find in the hash because of /i.

Replies are listed 'Best First'.
Re^2: Getting around "/" as a word boundary
by AnomalousMonk (Archbishop) on Aug 12, 2010 at 05:01 UTC

    The use of  | (ordered alternation) in the regex introduces a subtlety: Perl's implementation of this regex operator finds the first possibile match in the alternation regardless of match length. Since the order of strings returned from keys is essentially random, this may not be what you want. The use of  \b word boundaries and look-behind avoids the problem in the particular example given in Re: Getting around "/" as a word boundary, but this may not always be available.

    Usually, the longest match is needed. Sorting (in default order) and then reversing the order of sorted keys in the replacement hash produces the longest match: 'ABC', 'ABCD', 'ABCDE' (in any order) becomes 'ABCDE', 'ABCD', 'ABC'. E.g. (upper/lower case issues ignored):

    >perl -wMstrict -le "my %replace = ( DEXX => 'AREX', AREX => 'CUBE', ABC => 'VWX', ABCD => 'VWXY', ABCDE => 'VWXYZ', ); my $find = join '|', map quotemeta, keys %replace; $find = qr{ $find }xms; print qq{find regex: $find}; my $s = 'DEXX AREX CUBE ABC ABCD ABCDE'; print qq{before: '$s'}; (my $t = $s) =~ s{ ($find) }{$1/$replace{$1}}xmsg; print qq{after: '$t'}; print ''; my $longest = join '|', map quotemeta, reverse sort keys %replace; $longest = qr{ $longest }xms; print qq{find regex (longest match): $longest}; print qq{before: '$s'}; ($t = $s) =~ s{ ($longest) }{$1/$replace{$1}}xmsg; print qq{after: '$t'}; " find regex: (?msx-i: DEXX|ABC|ABCD|ABCDE|AREX ) before: 'DEXX AREX CUBE ABC ABCD ABCDE' after: 'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABC/VWXD ABC/VWXDE' find regex (longest match): (?msx-i: DEXX|AREX|ABCDE|ABCD|ABC ) before: 'DEXX AREX CUBE ABC ABCD ABCDE' after: 'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABCD/VWXY ABCDE/VWXYZ'

    Updates:

    1. The example given above implies misleadingly that use of a properly ordered alternation alone is sufficient, that  \b word boundaries are not needed in the case given in the OP. Not (necessarily) so:
      >perl -wMstrict -le "my %replace = ( ABC => 'XXX', ABCD => 'YYYY', ABCDE => 'ZZZZZ', ); my $find = join '|', map quotemeta, reverse sort keys %replace; $find = qr{ $find }xms; print qq{find regex: $find}; my $s = 'ABC ABCD xxABCDxx ABCDE'; print qq{before: '$s'}; (my $t = $s) =~ s{ ($find) }{$replace{$1}}xmsg; print qq{sans \\b: '$t'}; print ''; print qq{before: '$s'}; ($t = $s) =~ s{ \b ($find) \b }{$replace{$1}}xmsg; print qq{with \\b: '$t'}; " find regex: (?msx-i: ABCDE|ABCD|ABC ) before: 'ABC ABCD xxABCDxx ABCDE' sans \b: 'XXX YYYY xxYYYYxx ZZZZZ' before: 'ABC ABCD xxABCDxx ABCDE' with \b: 'XXX YYYY xxABCDxx ZZZZZ'
    2. See discussion of alternation in perlre and perlretut.

Re^2: Getting around "/" as a word boundary
by renshui (Novice) on Aug 12, 2010 at 08:34 UTC
    Hi ikegami, I am a perl beginner, I don't get the following regex very much. Can you explain the (?<!/) part in detail? Thanks in advance.

    $doc2=~s#(?<!/)\b($_)\b#$1/$hashstore{uc($_)}#ig;

      This is a "zero-width, look-behind assertion". The assertion is true if  (?<!/) does not immediately follow a '/' (forward slash) character at the point at which the assertion occurs in the regex.

      See  (?<!pattern) in the Extended Patterns section (in the Look-Around Assertions subsection) of perlre.

      Update: Added "zero-width" to description.

      "(?<!/)" means "not immediately preceded by '/'"