I cant remember the name of the CPAN version, but its possible to construct a relatively optimized Regex for matching multiple strings by constructing a Patricia Trie. (There are various discussions of this technique on PM.) Then the issue becomes simply

my %replace_hash=(foo=>'bar',baz=>'fnord'); my $regex=compile_regex(keys %replace_hash); s/\b($regex)\b/$replace_hash{$1}/g;

Its actually not difficult to construct the optimized regex, but the result scales poorly. Once you have more than a few dozen words involved the time take in backtracking etc (with or without look forward assertions) becomes signifigant. In that case Ive found that its actually faster to use the Patricia tree directly and not bother with the regex. This would not be true however if we had a choice of a DFA regex or an NFA regex. The Patricia Trie essentially repesents (most of) a DFA state transition table and as such it needs minimal backtracking. In fact it never backtracks over the initial character, advancing one character every match failure, and with further optimization it need not backtrack at all. (DFA's never backtrack, hence the term "deterministic")

update: I wrote a node explaining Patricia Tries here: Re:x2 A Regexp Assembler/Compiler (Whats a 'trie'?)


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

In reply to Re: 3 Examples of Multiple-Word Search n Replace by demerphq
in thread 3 Examples of Multiple-Word Search n Replace by chunlou

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.