find and replace

texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: find and replace by Corion (Patriarch) on Nov 04, 2005 at 07:43 UTC
The error in your script is that `s///` does not work on arrays. You need to loop over the array and replace the full words with their abbreviations line by line: `... for my $line (@myin) { $line =~ s!$full!$abb!g; }; ...` [download] That code is a bit inefficient as it will loop over the main text many times, and it also has some problems, because it doesn't take care that you only replace full words. For example the word examples would be replaced by egs by your script, which may or may not be what you want. If you only want to replace whole words, use the `\b` boundary in your regular expression: `... for my $line (@myin) { $line =~ s!\b$full\b!$abb!g; }; ...` [download] Also, by doing the replacements step by step, you can get conflicts if one abbreviation is (or creates) a new word that also has an abbreviation - if you don't want that, you need to do all the replacements at once, which means you will need to build a better regular expression. But as I don't know if that is a problem for you, I will skip on that.	[reply] [d/l] [select]
Re^2: find and replace by texuser74 (Monk) on Nov 04, 2005 at 07:54 UTC
Thanks Corion, now the output is correct.	[reply]
Re: find and replace by Zaxo (Archbishop) on Nov 04, 2005 at 09:02 UTC
You can translate jabb.txt into a hash and substitute directly like this: (untested) `my %jabb = do { open my $fh, '<', '/path/to/jabb.txt' or die $!; map { reverse split; } <$fh>; }; my $re = do { local $" = '\|'; qr!\b(@{[map { qr/\Q$_\E/ keys %jabb]})\b!; }; open my $in, '<', '/another/path/to/input.txt' or die $!; open my $out, '>', '/the/path/to/output.txt' or die $!; while (<$in>) { s/$re/$jabb{$1}/g; print $out $_; }` [download] The less tricky part there is reversing the map over split so the full text is the key in %jabb. The really tricky part is combining and escaping the keys of %jabb with '\|' to get a big alternation on the keys of %jabb so we can substitute the hash value for the key we find. If the values to substitute for have overlap at the initial position, you should order them so that the longest match is tried first After Compline, Zaxo	[reply] [d/l]
Re: find and replace by Skeeve (Parson) on Nov 04, 2005 at 08:47 UTC
What I noticed: You keep your input file in memory while you read your abreviations line by line. Depending on the usual size of those files this might be a good choice. OTOH: I'd expect an inputfile to be substantially larger than your abreviations. if so you might want to consider to keep the abreviations in memory and to loop over the input lines: # UNTESTED! use strict; use warnings; open( ABB, '<', 'jabb.txt' ) or die "Couldn't open ABB.\n$!"; open( IN, '<', 'input.txt' ) or die "Couldn't open infile.\n$!"; open( OUT, '>', 'output.out' ) or die "Couldn't open outfile.\n$!"; # prepare abreviations my %abb; while(<ABB>){ # there are shorter ways to do it, but to keep your code... if(/(.?)\t(.?)\n/) { $abb{$2}= $1 } } close ABB; # build a RE for quickly(?) finding abbreviatable words. my $abbre= join('\|', map "\Q$_\E",keys %abb); $abbre= qr/\b($abbre)\b/; # replace my @myin = <IN>; while(<IN>){ s/$abbre/$abb{$1}/g; print OUT; } close IN; close OUT; [download] Update: Thanks to Zaxo's obfu-entry ;-) below I noticed I forgot to escape the list of abbreviatable words. `s$$([},&%#}/&/]+}%&{});#$&&s&&$^X.($'^"%]=\&(\|?{%` `+`.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e	[reply] [d/l] [select]
Re: find and replace by l.frankline (Hermit) on Nov 04, 2005 at 09:41 UTC
`Hi,` `Try this.... it is quite helpful to you.` `here is the code....` `open( ABB, '<', 'jabb.txt' ) or die "Couldn't open ABB.\n$!";` `open( IN, '<', 'input.txt' ) or die "Couldn't open infile.\n$!";` `open( OUT, '>', 'output.out' ) or die "Couldn't open outfile.\n$!";` `my @myin = <IN>;` `while(<ABB>)` `{` `($abb,$full) = split/\t/,$_;` `chomp($abb);` `chomp($full);` `for ($t=0;$t<=$#myin;$t++)` `{` `$myin[$t] =~s/$full/$abb/g;` `}` `}` `print OUT "@myin\n";` `Regards` `Franklin`	[reply] [d/l] [select]
Re: find and replace by radiantmatrix (Parson) on Nov 04, 2005 at 15:06 UTC
You actually have two problems. First, regexes don't work on arrays directly. Second, a substitution (`s///`) returns the number of substitutions made. You don't want `$term = s/$full/$abb/g`, because the `=` will assign that count to `$term`. You want `=~` which means "make this regex act on" (sort of). This could be easily fixed by slurping, rather than reading your `IN` into an array. `# instead of @myin = <IN>; my $myin; { local $/ = undef; $myin = <IN>; #slurp all to scalar } #instead of print "@myin\n"; @myin = s/$full/$abb/g; print "$myin\n"; $myin =~ s/\b$full\b/abb/g` [download] Update:benizi pointed out my foolishness in capturing word boundaries. The sad thing is, I knew that and did it anyway. That's like the fourth really dumb thing I've done today. Sigh. Thanks, benizi! I took the liberty of ~~capturing~~matching word boundaries (\b) ~~and adding them back ($1 and $2)~~, since without that code, you'd replace too much. For example, trying to replace all occurances of 'hello' with 'hi', you'd change 'Othello' into 'Othi'. ;-) Checking for word boundaries should help with that. <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code "In any sufficiently large group of people, most are idiots" - Kaa's Law	[reply] [d/l] [select]
Re: find and replace by benizi (Hermit) on Nov 04, 2005 at 19:23 UTC
Word boundaries (\b) are zero-width, so you shouldn't capture them. ($1 and $2 will always be empty in your last line.) Thus, `$myin =~ s/\b$full\b/$abb/g;` is sufficient.	[reply] [d/l]