(a) the average length of the lines
(b) whether a line is likely to contain a match or not
(c) whether common substrings (XX1) are likely to occur in a line not containing a match to any of the full strings
For instance, if matches are uncommon, you could read the file in large chunks (x bytes, then read to next line boundary), perform regex on the chunks, then use index and rindex from each match position to select the output boundaries for the lines in between the match lines. This would probably be far more efficient than reading and matching line by line. If substrings aren't like to occur, you could change your match algorithm to first check each line for XX1 using index, then perform the more complicated (preferably optimized) regex match. Efficiency in the line by line method might also be improved by buffering output and printing in chunks - though depending on how Perl manages output, this might just duplicate internal mechanisms. I'd have to run some tests.
In reply to Re: perl performance vs egrep
by TedPride
in thread perl performance vs egrep
by dba
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |