in reply to Working with a very large log file (parsing data out)
If the file wasn't so large, I could just do something like:cat logfile.log | awk {'print $4'} | sort | uniq -c However, reading a 1.5TB file in to memory just isn't going to work :)
That command chain ought to work as is -- even with a very large file -- because each process in the chain (except sort) processes the file data line by line. And although sort needs to process the entire file, it knows how to use temporary files to spill intermediate results avoiding memory exhaustion.
I'm not saying it will be fast. But it should work.
However, something like this should also do the trick and be substantially faster (~1.25 60 hours):
perl -anle"++$h{ $F[ 4 ] } }{ print qq[$h{ $_ } $_] for sort keys %h" +theLogFile > resultsFile
Update: You might need $F[3]. I can't remember if awk's field numbers are zero-based or one-based?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Working with a very large log file (parsing data out)
by tmharish (Friar) on Feb 20, 2013 at 08:18 UTC | |
by BrowserUk (Patriarch) on Feb 20, 2013 at 08:49 UTC |