At my previous job, I did a *lot* of log processing. As much as I love Perl, for quick and dirty ad hoc log mangling, awk was frequently my go-to tool. For cases exactly as you describe, I used the following:

cat logfile.log | awk '{count[$4]++}; END {for (x in count) {print count[x], x}};' | sort -nr

By using an associative array (hash) to track the unique values, you reduce the amount of data you have to sort by orders of magnitude (potentially).

Note: This is not a "max performance" solution. It is a "usually fast enough" solution. If you want maximum performance, there are lots of additional things you can do to make this faster. One of the easiest things (that often pays quick dividends on modern multi-core/CPU systems) is to compress your log files. This decreases the disk IO, and for many systems will be faster than reading the whole uncompressed file from disk.

zcat logfile.log.gz | awk '{count[$4]++}; END {for (x in count) {print count[x], x}};' | sort -nr

Another possible speedup would be to do a perl-equivalent of the awk, but to stop your line split at the number of fields you care about (plus 1 for "the rest"). This will frequently be faster than the awk example, but is slightly less suitable to manually typing in every time you're hitting a log file for ad hoc log queries. Although, looking at them side-by-side, it's really not much more difficult; I think it's just the hundreds of times I typed the awk version that makes it pop quickly from my fingers.

zcat logfile.log.gz | perl -ne '@line = split " ",$_, 5; $count{$line[3]}++; END {print "$count{$_} $_ \n" for (keys %count); };'

In reply to Re: Working with a very large log file (parsing data out) by topher
in thread Working with a very large log file (parsing data out) by calebcall

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.