The use of | (ordered alternation) in the regex introduces a subtlety: Perl's implementation of this regex operator finds the first possibile match in the alternation regardless of match length. Since the order of strings returned from keys is essentially random, this may not be what you want. The use of \b word boundaries and look-behind avoids the problem in the particular example given in Re: Getting around "/" as a word boundary, but this may not always be available.
Usually, the longest match is needed. Sorting (in default order) and then reversing the order of sorted keys in the replacement hash produces the longest match: 'ABC', 'ABCD', 'ABCDE' (in any order) becomes 'ABCDE', 'ABCD', 'ABC'. E.g. (upper/lower case issues ignored):
>perl -wMstrict -le "my %replace = ( DEXX => 'AREX', AREX => 'CUBE', ABC => 'VWX', ABCD => 'VWXY', ABCDE => 'VWXYZ', ); my $find = join '|', map quotemeta, keys %replace; $find = qr{ $find }xms; print qq{find regex: $find}; my $s = 'DEXX AREX CUBE ABC ABCD ABCDE'; print qq{before: '$s'}; (my $t = $s) =~ s{ ($find) }{$1/$replace{$1}}xmsg; print qq{after: '$t'}; print ''; my $longest = join '|', map quotemeta, reverse sort keys %replace; $longest = qr{ $longest }xms; print qq{find regex (longest match): $longest}; print qq{before: '$s'}; ($t = $s) =~ s{ ($longest) }{$1/$replace{$1}}xmsg; print qq{after: '$t'}; " find regex: (?msx-i: DEXX|ABC|ABCD|ABCDE|AREX ) before: 'DEXX AREX CUBE ABC ABCD ABCDE' after: 'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABC/VWXD ABC/VWXDE' find regex (longest match): (?msx-i: DEXX|AREX|ABCDE|ABCD|ABC ) before: 'DEXX AREX CUBE ABC ABCD ABCDE' after: 'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABCD/VWXY ABCDE/VWXYZ'
Updates:
>perl -wMstrict -le "my %replace = ( ABC => 'XXX', ABCD => 'YYYY', ABCDE => 'ZZZZZ', ); my $find = join '|', map quotemeta, reverse sort keys %replace; $find = qr{ $find }xms; print qq{find regex: $find}; my $s = 'ABC ABCD xxABCDxx ABCDE'; print qq{before: '$s'}; (my $t = $s) =~ s{ ($find) }{$replace{$1}}xmsg; print qq{sans \\b: '$t'}; print ''; print qq{before: '$s'}; ($t = $s) =~ s{ \b ($find) \b }{$replace{$1}}xmsg; print qq{with \\b: '$t'}; " find regex: (?msx-i: ABCDE|ABCD|ABC ) before: 'ABC ABCD xxABCDxx ABCDE' sans \b: 'XXX YYYY xxYYYYxx ZZZZZ' before: 'ABC ABCD xxABCDxx ABCDE' with \b: 'XXX YYYY xxABCDxx ZZZZZ'
In reply to Re^2: Getting around "/" as a word boundary
by AnomalousMonk
in thread Getting around "/" as a word boundary
by sherab
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |