longjohnsilver has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning all you Enlightened Monks,

I'm trying to read a pattern from a quite fast logfile (writes approximately 4Megs of text in one minute and rotates every 10 minutes). I am using the File::Tail Module to keep the file in "constant tail". Everytime the code finds a set of patterns, they should be stored within a table on an Oracle db. Here is my code snippet and below some of the questions i'd like to be enlightened on:
use File::Tail; my $file_name = qw( logfile.log ) ; my $tail_file = File::Tail->new( name => $file_name, resetafter => 3, +adjustafter => 5 ) ; my $tail_line ; my %pattern = ( "Pattern1" => "Short Pattern to put on DB 1", "Pattern2" => "Short Pattern to put on DB 2", "Pattern3" => "Short Pattern to put on DB 3", ) ; while ( defined($tail_line = $tail_file->read) ) { foreach my $key ( keys %pattern ) { if ( $tail_line =~ /$key/ ) { --->> insert "Short Pattern" into the DB } } }
My questions are the following:

1. Is this the best way to grep the pattern on such a fast file? Can there be possible data loss using this module?
2. What could be the best way to put the data on the database? I was thinking of a single connection that stays open all time or eventually put the data on a textfile and handle with a sqlldr crontabbed every minute.
3. This executable should be lightweight and robust for the system it runs on. Could there be better approaches to solve this issue?

Thanks,

Francesco

Replies are listed 'Best First'.
Re: Reading from a fast logfile and storing on Oracle
by BrowserUk (Patriarch) on Dec 19, 2008 at 10:19 UTC

    The tail command is likely to be quicker than the perl module (much so on my system), and grep -f likely quicker than multiple invocations of the regex engine.

    And by pre-filtering the lines before you give them to Perl, it will provide some (2 levels of) buffering, to give the perl script more time to do the uploads of the filtered data.

    It's usually quicker to upload medium sized batches of data rather small ones, so collect lines into arrays until you have a small batch before uploading.

    Something along these lines might be worth testing:

    #! perl -slw use strict; use DBI; my $dbi = DBO->connect( ... ); ## Prepare multiple statments for different DBs my @sth = map{ $dbi->prepare( ... ) } 1 .. $DBS; ## Piped open pre-filters data thru tail and grep -f my $pid = open TAIL, "tail -f /path/to/the/log | grep -Pf patterns.file |" or die $!; my @collected; ## Uploading medium sized batches of data is usually qu +ickest while( <TAIL> ) { ## Decide which DB this line is destined for my $dbn = m[some selection criteria]; ## and put it into that batch push @{ $collected[ $dbn }, $_; ## subselect if you dont want the +whole line ## And when that batch reaches the optimum(?) size, upload it if( @{ $collected[ $dbn ] } >= 100 ) { ## 100 is a guess $sth[ $dbn ]->execute( @{ $collected[ $dbn ] ); @{ $collected[ $dbn ] } = (); } }; close TAIL;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I might also check to see if it has been more than some set number of seconds since you last stored the data, to try to reduce the potential of data loss of your perl program dies.

      --MidLifeXis

Re: Reading from a fast logfile and storing on Oracle
by moritz (Cardinal) on Dec 19, 2008 at 09:25 UTC
    while ( defined($tail_line = $tail_file->read) ) { foreach my $key ( keys %pattern ) {
    That's rather inefficient. If the patterns are just strings (not regexes), you can try something along these lines:
    my $regex = join '|', map quotemeta, keys %patterns; while ( defined($tail_line = $tail_file->read) ) { if ($tail_line =~ m/($regex)/) { print "logging $pattern{$1} for $tail_line"; } }

    If you can find a possibility to pipe the log directly into your program's STDIN, you can get rid of File::Tail, for which I don't know how efficient it is.

    I'd first try to open a connection with DBI, prepare an insert statement, and execute it each time a match is found. If that turns out to be too slow, you can still search for more elaborate ways.

      Thanks Moritz,

      Your answer already helped me to get new insights. I'd also wanted to say to everyone that i started to study a great book entitled "Perl Best Practices (O'Reilly)", i got a lot of good ideas out of it;

      F
Re: Reading from a fast logfile and storing on Oracle
by roboticus (Chancellor) on Dec 19, 2008 at 14:05 UTC
    longjohnsilver:

    I'd definitely go with the approach of putting the data into a text file and using sqlldr to bulk-import the data into the database. In fact, I'd split the task into two simple programs.

    One program could just create the file working.txt and write the formatted records to it. Every N lines or M seconds, it would close that file, and then rename it to ready.<YYYYMMDDhhmmss> (date/time stamp for the extension).

    Your other program would sit in a loop looking for a file matching ready.*. If it finds such a file, it would bulk load then delete it. If it doesn't find one, it would sleep for Y seconds before beginning the loop again.

    ...roboticus
      Hi all Monks,

      I'm back to the bone, thanks for your advices and tips, at the moment i'm trying to use a hybrid code given by the snippets of Moritz and BrowserUk. Unfortunately i couldn't find sqlldr on that external machine so i have to wait until it will be installed to do my tests.

      Have yourself a happy new year!

      Francesco