in reply to multiple substitution
See perlop on the /e switch for regular expressions. perlretut also covers it.
my %replace = ( apples => 'yummy', oranges => 'yummier', bananas => 'yummiest', ); $string =~ s!(apples|oranges|bananas)!$replace{$1} || $1!e;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: multiple substitution
by aaron_baugher (Curate) on Aug 25, 2012 at 16:39 UTC | |
I answered a similar question recently with a loop:
So I wondered how that would compare to your solution of combining the searches into a single regex. I thought your way might win for a few words, but surely with a lot of words the complexity of the regex would slow it down, right? Well, so much for that theory. The Perl regex engine continues to amaze me. I gave it a pattern combining 676 strings (all two-letter combinations) with pipes like yours, and it blew the forloop method away (92 times faster). It also beat a regex solution using Regexp::Assemble, but I was using very simple and known search strings, so the hand-made pipe method was safe and simple. With unknown or more complex strings, making it harder to hand-make a safe and efficient search pattern, I think RA would probably come out on top eventually. Anyway, my test and results:
Aaron B. | [reply] [d/l] [select] |
by AnomalousMonk (Archbishop) on Aug 25, 2012 at 18:11 UTC | |
The pipes() and regexpa() functions used in the timing loops above both include generation of the matching regexes in each loop execution. I doubt it adds greatly to the overall execution time, but is it proper to include regex generation in the timing of a substitution operation? On a more critical note, a substitution is done on the $s string in each repetition of each timing loop, but will there be anything to be found for substitution after the first pass of whatever timing function happens to be executed first? Are not all subsequent passes in all functions just comparing the time it takes for a regex to find no match in a string? (Maybe take the 8MB file content and x it into three identical 200 - 500MB strings and do just one comparison pass of substitutions on each string.) | [reply] [d/l] [select] |
by Corion (Patriarch) on Aug 25, 2012 at 16:48 UTC | |
I only (re)used what the OP had as a regular expression already. But your results mesh well with When Perl Isn't Quite Fast Enough - the less ops you need, and the more you can do within the RE engine, the faster your Perl code is. | [reply] |
|
Re^2: multiple substitution
by naturalsciences (Beadle) on Aug 25, 2012 at 10:08 UTC | |
Could you explain the code for a sec. Should those ! be /. I can understand $string =~ s/(apples|oranges|bananas)/$replace{$1}/e would take the first match from string ($1). Then because the /e tag the second part in substitution would be value complement to the key ($1). What is the deal with the || (or?) statement. (I guess I'm mistaken with the ! elements) Would this (mine own )code work?Did not want to use some convoluted regexp patterns because they might be usable this time but not always. Want to learn the tehnique to do such list/hash substitutions as in original question. | [reply] [d/l] [select] |
by AnomalousMonk (Archbishop) on Aug 25, 2012 at 17:28 UTC | |
Did not want to use some convoluted regexp patterns ... Want to learn the tehnique to do such list/hash substitutions ... A common approach to handling long search/replace string lists is to generate the search regex automatically from the keys of the search/replace hash. (Then you just have to worry about getting the hash right!)
Note that none of the conversion examples use the /e switch, which will make conversion slightly faster. In all the conversion examples, F99-9 is never converted: it just doesn't appear in the conversion @keys array. In the first conversion example, the F29-2 substring in FF29-22 and -F29-2- is converted even though it is embedded in another string: it appears in the conversion list. This is fixed for FF29-22 in the second example by using \b boundary assertions to allow conversion only if a search string is neither preceded nor followed by a 'word' character ([A-Za-z0-9_]), but this still allows the substring in -F29-2- to be replaced because '-' is not a word character. This problem (if problem it is) is fixed in the third example by using different boundary assertions: (?<! \S) and (?! \S) allow a match (and replacement) only if the potential match substring is neither preceded nor followed by a non-whitespace character.
Note that | (pipe) and not , (comma) is the alternation metacharacter. Update: aaron_baugher, in a reply already posted, gave an example of the automatic regex generation technique discussed above, but the examples of using boundary conditions to refine a match may still be useful. | [reply] [d/l] [select] |
by cheekuperl (Monk) on Aug 25, 2012 at 13:07 UTC | |
Could you explain the code for a sec. This part helps you replace the matched string with itself in case %replace does not have corresponding key. For example, Did not want to use some convoluted regexp Trust me, this is a simple regex. It can get a lot worse, if you delve deeper :) Want to learn the tehnique to do such list/hash substitutions as in original question As far as searching and replacing in strings is concerned, I guess regexes would be most helpful. | [reply] [d/l] |
by Corion (Patriarch) on Aug 25, 2012 at 13:20 UTC | |
| [reply] [d/l] [select] |
|
Re^2: multiple substitution
by naturalsciences (Beadle) on Aug 25, 2012 at 09:34 UTC | |
OK thanks! quote:"s///e treats the replacement text as Perl code, rather than a double-quoted string."Well that could be useful! | [reply] |