Binary Search Timestamps

Paraxial has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Binary Search Timestamps
by CountZero (Bishop) on Jul 16, 2014 at 10:05 UTC

Much better then to put these log files in a database and perform standard SQL queries on it.

You will only have to enter each line of the log file once into the database, nicely split into its various fields. It won't be terribly difficult to automate that. Then you can have all these statistics updated just by running your SQL queries, which will be much faster than re-reading the log file again and again.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

[reply]

Re^2: Binary Search Timestamps

by Paraxial (Novice) on Jul 16, 2014 at 15:29 UTC

In an ideal world, this would be the method I'd prefer to use also, though unfortunately we don't have the resources to do this at the moment.

Thanks for your input however, much appreciated.

[reply]

Re^3: Binary Search Timestamps

by Your Mother (Archbishop) on Jul 16, 2014 at 20:56 UTC

This is false economy; depending on your skill level… Putting it all into SQLite is semi-trivial—Pg or MySQL only slightly harder—and would obviate the need for repetitive, selective, time consuming, and error prone reparsing.

Sometimes taking a time hit up front for extra code, structure, or learning will save 10 fold down the road.

[reply]

Re^3: Binary Search Timestamps

by CountZero (Bishop) on Jul 17, 2014 at 16:17 UTC

Your Mother

And if you are running an Apache web server, then there is always mod_log_mysql (although its installation is not for the faint-hearted it seems) which logs directly into a MySQL database.

CountZero

My blog: Imperial Deltronics

[reply]
[d/l]

Re: Binary Search Timestamps
by Athanasius (Cardinal) on Jul 16, 2014 at 09:54 UTC

Hello Paraxial, and welcome to the Monastery!

Since you will need the data from all the logs up to an hour old, you don’t really need a binary search. Just read the file backwards until the latest record read is more than an hour old. I haven’t used it, but the module File::ReadBackwards is designed for just this task:

This module reads a file backwards line by line. It is simple to use, memory efficient and fast. ...
It is intended for processing log and other similar text files which typically have their newest entries appended to them.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]

Re^2: Binary Search Timestamps

by Paraxial (Novice) on Jul 16, 2014 at 15:28 UTC

I did in-fact look at this method yesterday, and it does seem to be a good way to do this vs binary searching, especially when working with things like log files.

With that said, the File::SortedSeek module seems to be a better fit for what I'm doing it seems, though thanks for your input. As a complete newbie to Perl, it's nice to see I didn't go too far off the mark when looking for a solution to this.

[reply]

Re: Binary Search Timestamps
by AppleFritter (Vicar) on Jul 16, 2014 at 09:46 UTC

Howdy Paraxial, welcome to the Monastery!

Binary Searches on Sorted Text Files has a useful snippet of code for binary-searching text files. It's intended for sorted in its current state, but all you'd really need to do to adapt it to your needs is modify the conditions for the recursive calls. In fact, I'd pass a callback function there as an extra parameter instead of hardcoding anything specific.

One of the comments on that node also points out File::SortedSeek, which looks like it may well be useful.

[reply]

Re^2: Binary Search Timestamps

by Paraxial (Novice) on Jul 16, 2014 at 15:17 UTC

Thanks for this, I found the link to the page on binary searching sorted text files before I posted this, though failed to see the comment mentioning File::SortedSeek!

You're right, it does exactly what I need and I've now managed to get it working with the log file as expected, so thank you!

I've found I will probably run this script every 5-10 minutes in order to reduce the amount of RAM required to run this as it does get quite hungry.

[reply]

Re: Binary Search Timestamps
by locked_user sundialsvc4 (Abbot) on Jul 16, 2014 at 10:58 UTC

Arrange to have the file rotated, then process the files that have rotated off, putting results into a database. Also, bear in mind that there are many existing programs out there which do this job most-completely, fun though it may seem to write yet another one. There must be a hundred already.

Re: Binary Search Timestamps
by Anonymous Monk on Jul 16, 2014 at 20:16 UTC

Another option is to have a (daemon) process reading the log file (think tail -f); this could wake up say every 5 minutes, consume new lines, update counter bins, write out the statistics page or meta log-file.

[reply]
[d/l]