It might be helpful if you could post a link to a sample data file to do some testing on. We don't know:

(a) the average length of the lines
(b) whether a line is likely to contain a match or not
(c) whether common substrings (XX1) are likely to occur in a line not containing a match to any of the full strings

For instance, if matches are uncommon, you could read the file in large chunks (x bytes, then read to next line boundary), perform regex on the chunks, then use index and rindex from each match position to select the output boundaries for the lines in between the match lines. This would probably be far more efficient than reading and matching line by line. If substrings aren't like to occur, you could change your match algorithm to first check each line for XX1 using index, then perform the more complicated (preferably optimized) regex match. Efficiency in the line by line method might also be improved by buffering output and printing in chunks - though depending on how Perl manages output, this might just duplicate internal mechanisms. I'd have to run some tests.


In reply to Re: perl performance vs egrep by TedPride
in thread perl performance vs egrep by dba

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.