Just to clarify before I start: What you want is to find the subset of @array elements which match any regex -- right?

Regex::PreSuf suggested above looks interesting, if your regexes really are just a list of words to match.

Otherwise, no matter which way you nest the loops, you can expect to have to do (140*20000)/2 regex matches on the average (/2 because you get out on the first match). You can optimise the loop (e.g., with study suggested above), but the payoff has to be in running it a lot less times.

To do that, you might think about ordering the regex list. (Here I'm assuming that the regex list is constant and the data varies from run to run). If you know something about the structure of the incoming data, you should be able to guess with some accuracy which regexes are most likely to match the most data entries. Put those at the front of the list. Depending on how much trouble you're willing to go to over this, you might even conduct some experiments on the data to find the best regexes.

Also consider ranking the regexes fastest-first, a determination you can probably get pretty close to by eyeball.

--Dinosaur


In reply to Re: Efficiency: Foreach loop and multiple regexs by Dinosaur
in thread Efficiency: Foreach loop and multiple regexs by neilwatson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.