in reply to fast greedy regex
If you just wanted the date and time, you could do ($date, $time) = split / /.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: fast greedy regex
by js1 (Monk) on Jun 07, 2004 at 22:19 UTC | |
Just to give an idea of what I'm doing this is a line from the log:
2004-03-01 22:00:12 2 15.32.17.34 200 TCP_HIT 3140 326 GET http www.wahm.com http://www.wahm.com/images/vote.gif u779479 DEFAULT_PARENT 61.2.249.106 - "Mozilla/4.0 (compatible; MSIE 5.01; Windows 95)" OBSERVED none - 61.2.249.47 SG-HTTP-Service
And my regex is like this:
Can you see whether a more explicit regex would speed the parse up?
Thanks,
js1. | [reply] [d/l] |
by sfink (Deacon) on Jun 08, 2004 at 04:41 UTC | |
The problem is that because you are using * everywhere, there are an exponential number of ways for that match to fail. Perhaps Perl is clever enough to avoid it, but it seems to me that if you hit a single malformed line, that expression could hang. I'd recommend avoiding the issue by using + pretty much everywhere you have a *, and anchoring the ends with ^ and $. Also, as someone else mentioned, it would be better to get rid of all those printf's and replace them with:
It also appears that you'd be better off doing a little extra work so that you can use split instead of a regex: Alternatively, you could try (but remember to cut the parens off the relevant item.) | [reply] [d/l] [select] |
by js1 (Monk) on Jun 08, 2004 at 20:49 UTC | |
Many thanks for all the interest and help here. All the replies were useful.
I really liked these constructs:
and
But I found the quickest solution was this:
This processed the following gzip'd log:
in 1 minute 32 sec
on a 2.6Ghz AMD processor (500MB). | [reply] [d/l] [select] |
by Roy Johnson (Monsignor) on Jun 07, 2004 at 22:51 UTC | |
For speed, I would expect that one call to print, rather than multiple calls to printf, would be something of a speedup. The PerlMonk tr/// Advocate | [reply] |
by CountZero (Bishop) on Jun 08, 2004 at 06:18 UTC | |
So don't re-invent the (broken) wheel and use a module such as Regexp::Log or Logfile or their derived classes. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law | [reply] |