I answered a similar question recently with a loop:

$s =~ s/$_/$h{$_}/g for keys %h;

So I wondered how that would compare to your solution of combining the searches into a single regex. I thought your way might win for a few words, but surely with a lot of words the complexity of the regex would slow it down, right?

Well, so much for that theory. The Perl regex engine continues to amaze me. I gave it a pattern combining 676 strings (all two-letter combinations) with pipes like yours, and it blew the forloop method away (92 times faster). It also beat a regex solution using Regexp::Assemble, but I was using very simple and known search strings, so the hand-made pipe method was safe and simple. With unknown or more complex strings, making it harder to hand-make a safe and efficient search pattern, I think RA would probably come out on top eventually. Anyway, my test and results:

abaugher@bannor> cat 989705.pl #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); use Regexp::Assemble; my %h = map { $_ => uc } ( 'aa' .. 'zz' ); my $s = `cat bigfile`; # 8MB file say "Testing with @{[-s 'bigfile']} byte file and @{[ scalar keys %h ] +} patterns"; cmpthese( 10, { 'forloop' => \&forloop, 'pipes' => \&pipes, 'regexpa' => \&regexpa, }); sub forloop { $s =~ s/$_/$h{$_}/g for keys %h; } sub pipes { my $p = join '|', keys %h; $s =~ s/($p)/$h{$1}/g; } sub regexpa { my $p = Regexp::Assemble->new->add(keys %h)->re; $s =~ s/($p)/$h{$1}/g; } abaugher@bannor> perl 989705.pl Testing with 8560854 byte file and 676 patterns Rate forloop regexpa pipes forloop 9.75e-02/s -- -96% -99% regexpa 2.40/s 2364% -- -74% pipes 9.08/s 9213% 278% --

Aaron B.
Available for small or large Perl jobs; see my home node.


In reply to Re^2: multiple substitution by aaron_baugher
in thread multiple substitution by naturalsciences

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.