It happens because the map file is read into a hash and normally there is no "order" to a hash. Thus, you can't guarantee that the search and replace will happen in the order you give in you map file.
You're actually hitting this part of the code:
# user didn't specify columns, so just SAR each line and leave
+ alone
} else {
# loop through mapping array for each line
foreach my $replace (keys(%map)) {
# ignore case?
if (defined($opt_ignore)) {
$YESMapping += ($_ =~ s/$replace/$map{$replace}/gi
+)
} else {
$YESMapping += ($_ =~ s/$replace/$map{$replace}/g)
}
}
print $OUT $_
}
Stick in a helpful print to "debug" what's going on:
# user didn't specify columns, so just SAR each line and leave
+ alone
} else {
# loop through mapping array for each line
foreach my $replace (keys(%map)) {
# ignore case?
if (defined($opt_ignore)) {
print "SAR on $_ with $replace\n";
$YESMapping += ($_ =~ s/$replace/$map{$replace}/gi
+)
} else {
$YESMapping += ($_ =~ s/$replace/$map{$replace}/g)
}
}
print $OUT $_
}
This is what we see using your input and mapping files:
{C} > msar input.txt map.txt -r -i
Reading mappings from file: map.txt
----------------------------
SAR on april
with il
SAR on apr3143
with barrel
SAR on apr3143
with april
SAR on apr3143
with pr
a94323143
SAR on barrel
with il
SAR on barrel
with barrel
SAR on 1168
with april
SAR on 1168
with pr
1168
----------------------------
Mapped 3 entries.
You could maybe fix it by adding in Tie::Hash (I think) which is supposed to be able to order your hash. You would need to manipulate the hash variable %map when it is loaded at the beginning of the program. Unfortunately, I don't have the time now to code this up, but hey, my Perl code is "open source" :-) so have at it!
UPDATE: If your infile is just the one column of words, call with:
{C} > msar input.txt map.txt -r -i -c 1
{C} > msar.pl in.txt map.txt -i -r -c 1
Reading mappings from file: map.txt
----------------------------
225
1168
----------------------------
Mapped 2 entries.
UPDATE: MSAR.pl code now updated to use -w option which replaces on WHOLE WORDS only. Also, map.txt file will be read AND parsed AND used in search and replace in the order it is written (line 1, line 2 ... line n). |