in reply to Regexp dashes at boundaries

Your regular expression doesn't do what you think it does (I think!). The alternation meta-character | works on the two entities directly next to it. You need to use parentheses with your alternation, maybe like this:
$text =~ s/(\bArmy\b)|(U\.S\. Army)/US-Army/g;
Update: Arghh! Completely disregard. Alternation extends further than I said unless you limit it with parentheses. I seem to be living in opposite land today.

... but it is cleaner to "factor out" the "Army" string and do something similar to:
$text =~ s/(U\.S\.)?\bArmy/US-Army/g;
Note: completely untested!

Hope this helps.

Replies are listed 'Best First'.
Re^2: Regexp dashes at boundaries
by tlm (Prior) on Mar 17, 2005 at 23:22 UTC

    The alternation meta-character | works on the two entities directly next to it.

    Not in Perl:

    $t = 'abcd'; $t =~ s/bc|xy/pq/; print "$t\n"; ==> apqd
    If what you said were true, the match above would have failed, and the contents of $t would have remained unchanged.

    the lowliest monk

      and for possible further illumination:
      $t = 'abcd'; $t1 = 'ababxycdcd'; $t2 = 'ababxycdcd'; $t3 = 'ababxycdcd'; # same as $t2, which is identical to $t1 $t4 = 'ababxycdcd'; # still the same... $t =~ s/bc|xy/pq/; print "\$t is: $t\n"; $t1 =~ s/(ab|xy)/pq/; print "\$t1 is: $t1\n"; $t2 =~ s/(ab|xy)/pq/g; print "\$t2 is: $t2\n"; $t3 =~ s/(ab)|(xy)/pq/g; print "\$t3 is: $t3\n"; $t4 =~ s/((ab)|(xy))/pq\1/g; #capture regex -outer (); grouping () in +side print "\$t4 is: $t4\n"; =head1 OUT Output of C:\_perl\pl_test>perl 440581.pl $t is: apqd matched the 'b' $t1 is: pqabxycdcd parenthesdized the alt; now matches the FIRST +'ab' $t2 is: pqpqpqcdcd added /g (match globally) replaces both 'ab's +and the xy $t3 is: pqpqpqcdcd Shifting parens ==> no change, here $t4 is: pqabpqabpqxycdcd Capturing regex -- 3 'pq' pairs, each pair f +ol by 'ab' or 'xy' =cut
      additional example added
Re^2: Regexp dashes at boundaries
by cormanaz (Deacon) on Mar 17, 2005 at 22:33 UTC
    I suppose you are right about the alternation error; however the suggest regexp yields:
    U.S. US-Army the US-Army has been berry-berry good to me US-US-Army US US-Army
    Which now does the first case wrong as well.