Re: Working with a very large log file (parsing data out)

If the file wasn't so large, I could just do something like:cat logfile.log | awk {'print $4'} | sort | uniq -c However, reading a 1.5TB file in to memory just isn't going to work :)

That command chain ought to work as is -- even with a very large file -- because each process in the chain (except sort) processes the file data line by line. And although sort needs to process the entire file, it knows how to use temporary files to spill intermediate results avoiding memory exhaustion.

I'm not saying it will be fast. But it should work.

However, something like this should also do the trick and be substantially faster (~~~1.25~~ 60 hours):

perl -anle"++$h{ $F[ 4 ] } }{ print qq[$h{ $_ } $_] for sort keys %h" 
+theLogFile > resultsFile
[download]

Update: You might need $F[3]. I can't remember if awk's field numbers are zero-based or one-based?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: Working with a very large log file (parsing data out) Select or Download Code

Replies are listed 'Best First'.
Re^2: Working with a very large log file (parsing data out) by tmharish (Friar) on Feb 20, 2013 at 08:18 UTC
... and be substantially faster (~1.25 hours) How did you figure the time?	[reply]
Re^3: Working with a very large log file (parsing data out) by BrowserUk (Patriarch) on Feb 20, 2013 at 08:49 UTC
By running it on a 5.4GB logfile -- that took 12.5 minutes -- and then scaling: 1.5TB / 5.4GB = 285 * 12.5 = 35550 / 60 = 59.2. + (a bit for contingency) = 75. And then making the mistake of treating that as minutes instead of hours! Thank you for the heads up, I'll correct the above! With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]