in reply to search and replace

Don't know if my approach is any better, but I had the same issue many years ago. Had a tab-delimited file that needed to be search and replaced for multiple terms in multiple columns. Instead of running a search and replace in Excel over and over for each item, I wrote a Perl script to read a mapping file (search term \t replace term) and do a "single pass" global search and replce

I've updated over the years to not just do columns. You may find it useful. It's in the Code area:

MSAR.pl

UPDATE:

{C} > more in.txt the dog chased my cat. {C} > more mapping.txt dog 1 cat 2 chased 3 the 4 {C} > msar in.txt mapping.txt Reading mappings from file: mapping.txt ---------------------------- 4 1 3 my 2. ---------------------------- Mapped 4 entries. {C} >

Replies are listed 'Best First'.
Re^2: search and replace
by Anonymous Monk on Apr 03, 2009 at 08:43 UTC
    I found a problem in your codes. While I have a text as below:
    april barrel
    and a dictionary like this:
    225 April 1168 barrel 3143 Il 9432 PR ....
    I get this as output form my input
    a94323143 11777rr340
    I dont know why april would be broken to a + PR + Il ...

      It happens because the map file is read into a hash and normally there is no "order" to a hash. Thus, you can't guarantee that the search and replace will happen in the order you give in you map file.

      You're actually hitting this part of the code:

      # user didn't specify columns, so just SAR each line and leave + alone } else { # loop through mapping array for each line foreach my $replace (keys(%map)) { # ignore case? if (defined($opt_ignore)) { $YESMapping += ($_ =~ s/$replace/$map{$replace}/gi +) } else { $YESMapping += ($_ =~ s/$replace/$map{$replace}/g) } } print $OUT $_ }

      Stick in a helpful print to "debug" what's going on:

      # user didn't specify columns, so just SAR each line and leave + alone } else { # loop through mapping array for each line foreach my $replace (keys(%map)) { # ignore case? if (defined($opt_ignore)) { print "SAR on $_ with $replace\n"; $YESMapping += ($_ =~ s/$replace/$map{$replace}/gi +) } else { $YESMapping += ($_ =~ s/$replace/$map{$replace}/g) } } print $OUT $_ }

      This is what we see using your input and mapping files:

      {C} > msar input.txt map.txt -r -i Reading mappings from file: map.txt ---------------------------- SAR on april with il SAR on apr3143 with barrel SAR on apr3143 with april SAR on apr3143 with pr a94323143 SAR on barrel with il SAR on barrel with barrel SAR on 1168 with april SAR on 1168 with pr 1168 ---------------------------- Mapped 3 entries.

      You could maybe fix it by adding in Tie::Hash (I think) which is supposed to be able to order your hash. You would need to manipulate the hash variable %map when it is loaded at the beginning of the program. Unfortunately, I don't have the time now to code this up, but hey, my Perl code is "open source" :-) so have at it!

      UPDATE: If your infile is just the one column of words, call with:

      {C} > msar input.txt map.txt -r -i -c 1

      {C} > msar.pl in.txt map.txt -i -r -c 1 Reading mappings from file: map.txt ---------------------------- 225 1168 ---------------------------- Mapped 2 entries.

      UPDATE: MSAR.pl code now updated to use -w option which replaces on WHOLE WORDS only. Also, map.txt file will be read AND parsed AND used in search and replace in the order it is written (line 1, line 2 ... line n).

Re^2: search and replace
by Anonymous Monk on Mar 17, 2009 at 13:13 UTC
    Thanks ... nice work :)

      I just realized that I have your "mapping file" backwards, so I added a "-r" (reverse) option so you can keep your mapping file the way you have it and still use the program. I just uploaded the new code about 5 mins ago (called version 1.31 dated 17 MAR 2009) so check that out if you haven't already.