Faster search and replace?

snax has asked for the wisdom of the Perl Monks concerning the following question:

I have snippet of code I reuse a lot for searching and replacing:

sub replace (\%\@) {
    my ($repl_ref, $text_ref) = @_;

    my $repl_str = join '|', (keys %$repl_ref);

    for (@$text_ref) {
        s/($repl_str)/$$repl_ref{$1}/g;
    }

}
[download]

This is used by setting up a replacement hash with keys that are tags for replacements and the values are the desired replacements -- the text with the tags is typically slurped from a template file into an array.

Anyway, here's my question: Is this dog slow? Is there a way to make this more efficient?

I've recently run into a situation where I have text from a Mac source with European characters that I need to play with on a windows machine, and I use the same code above with the replacement hash having the single "wrong" characters as keys and the "right" characters (using maps grabbed from unicode.org) as values. As one might imagine, this results in some waiting for files to process.

Any commentary is most appreciated...

Comment on Faster search and replace? Download Code

Replies are listed 'Best First'.
Re: Faster search and replace? by mirod (Canon) on Nov 09, 2000 at 14:29 UTC
If you never change the expression (the `repl_ref` hash) you can start by adding the `o` modifier to the regexp: `s/($repl_str)/$$repl_ref{$1}/go;` That's the easiest and you might want to stop there if you're happy with the result. If you want to speed it up more then you will have to work on the left part of the pattern. Is there any way you can match the `repl_str` stuff without a huge `\|`? A character class maybe if you are looking for characters outside of the 0-127 range, or maybe an escape character and then an odd character? A mixture of the two? It depends on your data.	[reply] [d/l]
RE: Re: Faster search and replace? by snax (Hermit) on Nov 09, 2000 at 15:12 UTC
Oh -- right. Compile the regex once, right? That's a good idea. Further, you really point right to the problem -- I'm thinking about this "wrong" and just re-using code in a way that isn't efficient. My snippet is great for little form letter types of things where I don't know too much about the problem beforehand, but in this case I know very precisely beforehand the characters that are "wrong" and for each one I know the "right" replacement -- so I should be using a `eval("tr/$wronglist/$rightlist/");` [download] which has got to be better optimized for this kind of application. Interesting: someone else has done this same thing and found (apparently) a memory leak. I guess I'm on the right track now. Thanks for the new perspective! As always, TMTOWTDI, and some are better than others :)	[reply] [d/l]
RE: Faster search and replace? by extremely (Priest) on Nov 09, 2000 at 16:17 UTC
You might consider sorting the keys too. The minor slowdown may someday pay off in the regex optimizer. Of course I took the Regex book to work Monday so I don't have it here. It said something about refactoring similar words and order dependencies in alternations... Other than s///go as mentioned already, I don't see anything else you can do with this. I'd have to throw Benchmark at alternatives since this is pretty sexy code =) -- $you = new YOU; honk() if $you->love(perl)	[reply]
RE: RE: Faster search and replace? by snax (Hermit) on Nov 09, 2000 at 17:25 UTC
blush I've always been fond of this little piece of code :) Anyway, here's the lowdown: for situations where I want to replace text tags in form templates, this is a great hack. For the task that made me wonder if it might be inefficient, it proved to be the Wrong Tool. Doing the translation from Mac euro-characters to their Windows code page equivalents is definitely a tr/// job. Crude benchmarks suggest an enormous difference: less than a second for a translation that used to take almost a minute. Your humble supplicant is most pleased with the wisdom imparted by the Monks :)	[reply]
Re: Faster search and replace? by AgentM (Curate) on Nov 10, 2000 at 04:15 UTC
study() is an excellent function to use in regex-intensive sitautions! Be sure to read its accompying documentation, though. This is only useful in cases where your regex is constant and well-weathered (used alot). AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.	[reply]

AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.