in reply to Re^3: fast greedy regex
in thread fast greedy regex
Many thanks for all the interest and help here. All the replies were useful.
I really liked these constructs:
s/\".*\"// my @F = split(/\s+/);
and
my @F = /(\" .*? \" | \S+)/gx
But I found the quickest solution was this:
while (<>){ $front = substr( $_, 0, index($_, '"' )-1, ""); $back = substr( $_, rindex( $_, '"' )+2); $user_agent = substr ($_, 1, rindex( $_, '"' )); $front=~/^([^#\s]+\s\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+) +\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)$/; print <<END_FORMAT; date time = $1 time taken = $2 c-ip = $3 sc-status = $4 ns-action = $5 sc-bytes = $6 cs-bytes = $7 cs-method = $8 cs-uri-scheme = $9 cs-host = $10 cs-uri-stem = $11 cs-username = $12 s-hierarchy = $13 s-supplier-name = $14 cs(Content-Type) = $15 cs(User-Agent) = $user_agent END_FORMAT $back=~/^(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s*$/; print <<END_FORMAT2 sc-filter-result = $1 sc-filter-category = $2 x-virus-id = $3 s-ip = $4 s-sitename = $5 END_FORMAT2 }
This processed the following gzip'd log:
bash-2.05b$ ls -l SG* -rwxr-xr-x 1 js js 106236830 Mar 7 17:02 SG_CSGL02_mai +n_470302220000.log.gz
in 1 minute 32 sec
bash-2.05b$ time gzip -dc SG* | ./test.pl >/dev/null 6.96user 0.63system 1:32.95elapsed 8%CPU (0avgtext+0avgdata 0maxreside +nt)k 0inputs+0outputs (94major+33minor)pagefaults 0swaps
on a 2.6Ghz AMD processor (500MB).
|
|---|