Turns out that trying to keep track of submatches simply isn't worth it. If I split my log using a regex (with the above mentioned parethesized subexpressions whose content I wish to retain and use too) to match the head of each log entry, it is simply many times faster to apply the regex again to each individual log entry to get the list of matching subexpressions, than it is to store them in an array-of-arrays and pass them around for later reference.
My guess is that it is rather expensive to toss references to lists around, while matching a regex against a small text (a single log entry) where you know it is going to match at the first character position, is cheap.
While this double-regex matching seem redundant, it has the benefit of making the program both fast, and easy to read. I'll get back to ya'all with the code soon enough. :)
In reply to Re^3: Efficient log parsing?
by zrajm
in thread Efficient log parsing?
by zrajm
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |