hv: Your hash lookup implementation runs twice as fast (34" vs 1'05" for my here-doc regexes). Another difference is it runs faster when operating on lines compared to words. sed seems unbeatable at 6 seconds.

Glad it's making some progress, at least. :)

It occurs to me now that since you do not need the /.*/ "to end of line" behaviour, you also do not actually need to split the text on newlines: you could work directly on the full text. That would substantially reduce the number of ops you execute, which should give a further speedup.

The next step beyond that would be to combine the three substitutions into a single one, with a single hash. The idea here would be to concatenate the three regexps from the previous iteration, but wrapping the whole in (?|...) so the three distinct captures each get saved as $1, and make a single "master" lookup combining each of %w1, %w2, %w3. If we can combine "was/were" in there as well, I think we'd be starting to get properly competitive with the sed scripts.

It is also worth considering whether you need Unicode support (I have no idea whether your sed supports it). If you do not need Unicode, you should also be able to get further speed by adding aa to the regexp flags, like my $re1 = qr{\b(@{[ join '|', reverse sort keys %w1 ]})\b}iaa;


In reply to Re^5: Need to speed up many regex substitutions and somehow make them a here-doc list by hv
in thread Need to speed up many regex substitutions and somehow make them a here-doc list by xnous

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.