comment on

The use of | (ordered alternation) in the regex introduces a subtlety: Perl's implementation of this regex operator finds the first possibile match in the alternation regardless of match length. Since the order of strings returned from keys is essentially random, this may not be what you want. The use of \b word boundaries and look-behind avoids the problem in the particular example given in Re: Getting around "/" as a word boundary, but this may not always be available.

Usually, the longest match is needed. Sorting (in default order) and then reversing the order of sorted keys in the replacement hash produces the longest match: 'ABC', 'ABCD', 'ABCDE' (in any order) becomes 'ABCDE', 'ABCD', 'ABC'. E.g. (upper/lower case issues ignored):

>perl -wMstrict -le
"my %replace = (
   DEXX  => 'AREX',
   AREX  => 'CUBE',
   ABC   => 'VWX',
   ABCD  => 'VWXY',
   ABCDE => 'VWXYZ',
   );
 my $find = join '|', map quotemeta, keys %replace;
    $find = qr{ $find }xms;
 print qq{find regex: $find};
 my $s = 'DEXX AREX CUBE ABC ABCD ABCDE';
 print qq{before: '$s'};
 (my $t = $s) =~ s{ ($find) }{$1/$replace{$1}}xmsg;
 print qq{after:  '$t'};
 print '';
 my $longest = join '|', map quotemeta, reverse sort keys %replace;
    $longest = qr{ $longest }xms;
 print qq{find regex (longest match): $longest};
 print qq{before: '$s'};
 ($t = $s) =~ s{ ($longest) }{$1/$replace{$1}}xmsg;
 print qq{after:  '$t'};
"
find regex: (?msx-i: DEXX|ABC|ABCD|ABCDE|AREX )
before: 'DEXX AREX CUBE ABC ABCD ABCDE'
after:  'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABC/VWXD ABC/VWXDE'

find regex (longest match): (?msx-i: DEXX|AREX|ABCDE|ABCD|ABC )
before: 'DEXX AREX CUBE ABC ABCD ABCDE'
after:  'DEXX/AREX AREX/CUBE CUBE ABC/VWX ABCD/VWXY ABCDE/VWXYZ'
[download]

Updates:

The example given above implies misleadingly that use of a properly ordered alternation alone is sufficient, that \b word boundaries are not needed in the case given in the OP. Not (necessarily) so:

>perl -wMstrict -le
"my %replace = (
   ABC   => 'XXX',
   ABCD  => 'YYYY',
   ABCDE => 'ZZZZZ',
   );
 my $find = join '|', map quotemeta, reverse sort keys %replace;
    $find = qr{ $find }xms;
 print qq{find regex: $find};
 my $s = 'ABC ABCD xxABCDxx ABCDE';
 print qq{before: '$s'};
 (my $t = $s) =~ s{ ($find) }{$replace{$1}}xmsg;
 print qq{sans \\b: '$t'};
 print '';
 print qq{before: '$s'};
 ($t = $s) =~ s{ \b ($find) \b }{$replace{$1}}xmsg;
 print qq{with \\b: '$t'};
"
find regex: (?msx-i: ABCDE|ABCD|ABC )
before:  'ABC ABCD xxABCDxx ABCDE'
sans \b: 'XXX YYYY xxYYYYxx ZZZZZ'

before:  'ABC ABCD xxABCDxx ABCDE'
with \b: 'XXX YYYY xxABCDxx ZZZZZ'
[download]

See discussion of alternation in perlre and perlretut.

In reply to Re^2: Getting around "/" as a word boundary by AnomalousMonk
in thread Getting around "/" as a word boundary by sherab

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.