I implemented a quick debug option that spits non-matches out to STDERR. In testing I found a pattern bug with byte counts of 304 log entries. Both are fixed in the following diff:
26c26 < GetOptions (\%optctl, "type|t=s", "pattern|p=s"); --- > GetOptions (\%optctl, "type|t=s", "pattern|p=s", "debug|d=i"); 30,32c30,32 < 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+)}, [qw(h l u t r c b)] ], < 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*) +\" (\d+) (\d+)}, [qw(v h l u t r c b)] ], < 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ], --- > 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+)}, [qw(h l u t r c b)] ], > 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*) +\" (\d+) ([\d\-]+)}, [qw(v h l u t r c b)] ], > 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ], 35,36c35,36 < 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c b R +A P T)] ], < 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R T)] +], --- > 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c +b R A P T)] ], > 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R +T)] ], 102a103,104 > } elsif ($optctl{debug} == 1) { > print STDERR $_;

With the new patterns, a quick match against 79154 lines from an access log of 'extended' format had 8 lines which didn't match. All of them were because of quotes in the request or the user agent strings.

Here's a user agent that didn't match...
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; <HTML><A% +20HREF="http://www.pghconnect.com/">www.pghconnect.com</a></HTML>)"

In reply to Re: Re: Re: Multi-Format Log Parser - Version 2.0 by cjensen
in thread Multi-Format Log Parser - Version 2.0 by cjensen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.