in reply to Re^2: How to count substitutions on an array
in thread How to count substitutions on an array

You say you are quite satisfied with your current solution, but perhaps this may be of interest for future reference.

... no regex expressions, only pure textual substitutions ... substitutions of varying lengths requiring the longer ones to come first. ... no change, but the substitution itself will get padded so that it will not match subsequently ...

This raises a point I had overlooked before. It's possible to add longest-first discrimination when building an alternation. I've also made an attempt to add some acceptance of variable whitespace to the solution. There's also a feature to skip over certain phrases. This avoids the substitution of a substring with itself just to step over it in a possibly expensive no-op. (The weird capitalization is just to emphasize the substituted bits.)

c:\@Work\Perl>perl -le "use 5.010; ;; use warnings; use strict; ;; my @skip_over = ( 'every one of them', 'all in good time', ); ;; my ($skip) = map qr{ \b (?: $_ ) \b (*SKIP) (*FAIL) }xms, join q{ | }, map qr{ \Q$_\E }xms, sort { length($b) <=> length($a) } @skip_over ; print qq{\$skip: $skip \n}; ;; my %direct_substitution = ( 'every one' => 'EVERYONE', 'sore athirst' => 'Very Thirsty', 'athirst' => 'THIRSTY', 'all in' => 'GONZO', ); ;; my ($capture) = map qr{ \b (?: $_ ) \b }xms, join q{ | }, map qr{ \Q$_\E }xms, sort { length($b) <=> length($a) } keys %direct_substitution ; print qq{\$capture: $capture \n}; ;; my $line = qq{every one wang chung every one \n} . qq{of them are sore athirst if not well athirst. \n} . qq{all in good time we will enjoy all in wrestling. \n} ; print qq{[[$line]] \n}; ;; $line =~ s{ \s+ }' 'xmsg; print qq{(($line)) \n}; ;; my $count = 0; $count += $line =~ s{ ($skip | $capture) } {$direct_substitution{$1}}xmsg; ;; print qq{substitutions: $count}; print qq{<<$line>> \n}; " $skip: (?^msx: \b (?: (?^msx: every\ one\ of\ them ) | (?^msx: all\ in +\ good\ time ) ) \b (*SKIP) (* FAIL) ) $capture: (?^msx: \b (?: (?^msx: sore\ athirst ) | (?^msx: every\ one +) | (?^msx: athirst ) | (?^msx : all\ in ) ) \b ) [[every one wang chung every one of them are sore athirst if not well athirst. all in good time we will enjoy all in wrestling. ]] ((every one wang chung every one of them are sore athirst if not well +athirst. all in good time we will enjoy all in wrestling. )) substitutions: 4 <<EVERYONE wang chung every one of them are Very Thirsty if not well T +HIRSTY. all in good time we will enjoy GONZO wrestling. >>
(Some long output lines have been arbitrarily wrapped when composing this post.) Of course, I intend each  $line to be an element in an array over which you're looping.

Update: Of course, it's possible to get rid of the
    $line =~ s{ \s+ }' 'xmsg;
whitespace collapsing step and make the code even more whitespace agnostic.


Give a man a fish:  <%-{-{-{-<