gchitte has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to parse through a huge log file and produce statastics for analysis ( for eg no:of errors, freaquency etc) . The time taken to grep errors or match errors through regular expressions and sending it back to web interface in lesser time is the challenge , can you please provide some solution?
  • Comment on Dealing with huge log files 100 MB to 1 GB

Replies are listed 'Best First'.
Re: Dealing with huge log files 100 MB to 1 GB
by moritz (Cardinal) on May 17, 2010 at 08:14 UTC
    Pre-compute the values, and then access the computed values from the web interface.

    Reading 1GB of log files from disc is already slow, independently of how fast Perl is with processing the data - it's still likely IO bound. So the solution must be not to read the whole log file in response to an action from the web interface.

    Perl 6 - links to (nearly) everything that is Perl 6.

      To further the above: Think database. The work process you want is 'Read/compute/store' then 'query/display'. (This assumes you don't want exactly the same statistics every day, in which case just generate a static web page with that when the log files roll over.)

      Databases work with massive amounts of data quite well: I deal with multiple logs that are about 1GB compressed every day, and once they are parsed and in the database, data return is nearly instant.

      And, luckily enough, Perl has good database support for just about any database you'd ever want to use (and a few you wouldn't).

Re: Dealing with huge log files 100 MB to 1 GB
by ig (Vicar) on May 17, 2010 at 09:22 UTC

    Maybe you can pre-process the log file then quickly display the results when queried. I have processed website access logs that were about 1GB daily. The reports were detailed and could be accessed in a few seconds, despite it taking several hours to produce them.

    If your analysis is not I/O bound (or even if it is) you can prototype your analysis in Perl, then use profiling to identify the critical bits and improve them.

Re: Dealing with huge log files 100 MB to 1 GB
by weismat (Friar) on May 17, 2010 at 09:06 UTC
    Try to read through the log file only once.
    If it is only one log file and the size is ok, consider to store the logs under swap/ram disk to increase the speed.
Re: Dealing with huge log files 100 MB to 1 GB
by RMGir (Prior) on May 17, 2010 at 11:59 UTC
    This isn't useful for statistics, but if you need to quickly pull up sections of the large files to diagnose things like "what happened around time x?", File::SortedSeek is incredibly useful.

    For your current problem, as everyone else said "precompute" is probably the best answer.

    You could probably do something complicated to map/reduce the statistics gathering in parallel across sections of the file, but it's probably not worth it.


    Mike