alanonymous has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I seek guidance on what could potentially be a very easy problem to solve. I have a program that writes an output log to a standard text file on a Win32 machine. I'm looking for a way to parse the newest lines added to that log in real-time (new lines added every 100ms - 20s depending). Each log line contains a unique key, so detecting a new and different line should be simple. So I would need some kind of continuous hold/loop waiting for either changes to the log file, or specifically new lines added, etc. My script would then perform an action based on whether or not a couple of pieces of info are found in the new line of the log file.

Simply reading a text file is simple, looking for new lines is simple, parsing the information out is relatively simple, but the log files tend to grow rather large (>1MB) and re-reading the entire log file every X ms interval seems like a huge waste of resources. I don't know why but the brute force read the entire log over and over method comes to mind first and it feels wrong :/

Is there a simpler, more elegant, more perl method to do this? I am relatively new to perl but not new to programming.

Thank you for any advice.

-Alan

Replies are listed 'Best First'.
Re: Parse Log File As Written
by dsheroh (Monsignor) on Sep 24, 2007 at 05:13 UTC
    Your feeling is correct - re-parsing the file every X ms is indeed wrong.

    Although I've never tried it, I assume the File::Tail module should work on Win32. If not, you can use tell to get the position in the file when you finish one read, then seek to that position and continue from there on the next - just be sure that the position you record is one byte past the last newline in the file, since you could hit EOF in mid-line if the file is being written to as you read it.

Re: Parse Log File As Written
by atemon (Chaplain) on Sep 24, 2007 at 05:15 UTC

    One possible solution can be using Tie::File.

    "Tie::File represents a regular text file as a Perl array. Each element in the array corresponds to a record in the file. The first line of the file is element 0 of the array; the second line is element 1, and so on. The file is not loaded into memory, so this will work even for gigantic files. Changes to the array are reflected in the file immediately."

    You may write a program which runs every few ms or reside in the memory. One sample program can be

    #/usr/bin/perl $file = "my-log_file.log"; $o = tie @array, 'Tie::File', $file; $o->flock; # lock the file. See Note 1. $line_count = 0; while( 1 ){ if( $line_count < $#array ){ #New line is added $line_count++; foreach $line($line_count .. $#array ) { #### Your code to parse @array[$line];#### } $line_count = $#array; } sleep(1); #parse every second }
    Hope this helps.

    Notes :

    1. Tie::File docs say, "All the usual warnings about file locking apply here. In particular, note that file locking in Perl is advisory, which means that holding a lock will not prevent anyone else from reading, writing, or erasing the file; it only prevents them from getting another lock at the same time. Locks are analogous to green traffic lights: If you have a green light, that does not prevent the idiot coming the other way from plowing into you sideways; it merely guarantees to you that the idiot does not also have a green light at the same time."
    2. Since Tie::File won't load the file into memory, it avoids brute force read of entire file.
    3. This is just a possible method and the code is NOT tested.

    Updates :

    • Fixed few typos noticed
    • Inserted Note 2

    Cheers !

    --VC



    There are three sides to any argument.....
    your side, my side and the right side.

      "Changes to the array are reflected in the file immediately."

      Note that the Tie::File docs do not say here (or anywhere else) that changes to the file will be reflected in the array immediately. Tie::File may be capable of following additions to a ever-growing file, but that really doesn't appear to be one of the tasks it was designed to handle.

      A couple more relevant bits from the locking section of the Tie::File docs are

      When you use flock to lock the file, Tie::File assumes that the read cache is no longer trustworthy, because another process might have modified the file since the last time it was read. Therefore, a successful call to flock discards the contents of the read cache and the internal record offset table.

      and

      The best way to unlock a file is to discard the object and untie the array.

      To me, these indicate that, if the underlying file is changed, then Tie::File will have to rebuild its table of record offsets, which means starting over from the beginning of the file and reading every line until it reaches the one you're looking for (i.e., the one that was just added to the end of the file), which is exactly the sort of re-scan we're trying to avoid here.

        "Changes to the array are reflected in the file immediately."

        This is something I copied from CPAN's Tie::File. it IS the 3rd line of the description.

        The CPAN doc gives other caveats of Tie::File

        Cheers !

        --VC



        There are three sides to any argument.....
        your side, my side and the right side.

Re: Parse Log File As Written
by CountZero (Bishop) on Sep 24, 2007 at 05:47 UTC
    Another way of tackling this problem is to have the program write its log-output not to a file but to a Perl-script and have the script immediately write it to a file (so not data get lost) and then parse the new info and do what needs to be done with it.

    It avoids having to lock the file or try and detect when new info is (completely) added.

    Of course it assumes (big "IF") that your program can write its log-output to STDOUT or STDERR where you can easily capture it. I'm not sure that on Windows you can replace a (log-)file by a script as you can do under Linux or such.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Parse Log File As Written
by andreas1234567 (Vicar) on Sep 24, 2007 at 06:31 UTC
    I have a program that writes an output log to a standard text file on a Win32 machine.

    Do you have the source code for the "program" in question? If yes, could it be implemented in Perl, or at least, use Perl in any way to detect a new and different line other than parsing a file?

    There are many ways to embed Perl in other languages, e.g. perlembed.

    --
    Andreas
Re: Parse Log File As Written
by alanonymous (Sexton) on Sep 24, 2007 at 12:39 UTC
    VC, that looks to be almost exactly what I was looking for. After work today I'll see what I can do as far as actually getting the code to work for my application and really start to play around with it.

    Andreas, unfortunately I cannot control the program creating the logs in any way. It would have to be some type of script just monitoring the log file itself.

    Thanks a bunch for the responses guys.

    -Alan

      Does your program need to do other work while waiting for the logfile to update? If not, I think you really want esper's suggestion of File::Tail.

        No, the script wouldn't need to do anything while it is waiting for an update to the log file. My only concern is if the script performs an action based on an update to the log, but the action is still being executed while the next update occurs. Will the File::Tail execute multiple instances of the parsing conditions if updates occur too frequently? So if two updates occur at almost the same exact time, but it takes ~100ms for the execution of the parsing code and other actions, will that cause a problem?

        The File::Tail does seem ideal IF it can handle fast updates.

        Thanks for all the advice! Much appreciated!

        -Alan
Re: Parse Log File As Written
by alanonymous (Sexton) on Sep 25, 2007 at 03:01 UTC
    I've been playing around and testing for a while now ... What I've learned:

    File::Tail does NOT work for Win32 systems.

    The Tie::File method seems like it could work but I'm having problems with the code:
    #!/perl use strict; use warnings; use Tie::File; my $logfile = "blablalogfile.txt"; #my $logfile = "test.txt"; my $o = tie my @array, 'Tie::File', $logfile; $o->flock; my $line_count = $#array; while (1) { if ($line_count < $#array) { $line_count++; # foreach my $line (@array) { foreach my $line ($line_count .. $#array) { print $line; } $line_count = $#array; } sleep(1); }
    Basically, the program runs, and goes to the continuous loop as expected. The problem is that if the $o->flock; is there, the program won't update the log file, and even if I use a test text file and manually save the file, Windows won't allow it because another program has locked it. If I uncomment the lock (I'll never want to write to the log file, so is locking it even necessary?) then the loop never detects that the file is updated, which it is. I'm not familiar with the context used in the foreach loop in the example ... Will the '$line_count ... $#array' actually count in the @array if not specifically specified? Even going through all of @array upon a newline yields no output.

    Thanks for all the help ... I don't know why but I always seem to make things difficult, and I might be going a bit crazy!

    -Alan

    Edit:Is Tie::File capable of watching changes to the file in realtime? I've noticed in the docs that the file updates as soon as the array is modified and written, but does the array update if the file is updated?

    Edit 2:It seems that the $#array never updates to reflect the current number of lines in the file, without flock. With flock the file never updates so I can't determine if $#ever would update.
      That's too bad that File::Tail doesn't work on Win32... This is exactly the sort of thing it's designed to do. I've run across a trick you can do with seek to get it to clear the EOF status of a file while keeping it open. Now if I just remember where I saw that...

      Found it (p.779 of Programming Perl, in case you want to play at home), but it's not actually that robust in that it has the problem I mentioned earlier with partially-written lines.

      So give this a try:

      open $logfh, '<', $logfile; my $logpos; while (1) { while (my $line = <$logfh>) { last unless substr($line, -1) eq "\n"; # May need to tweak for Win +32 print $line; $logpos = tell $logfh; } sleep 1; seek $logfh, $logpos, 0; }
      The mucking about with $logpos/last is to keep track of the last line break you saw so that the next iteration of the outer while will always pick up from there instead of in the middle of a line, even if the file is being written to at the same time as you read the last line and get a partial line because of it.

      You might also want to store $logpos somewhere nonvolatile (config file, database, etc.) so that you don't have to restart from the beginning of the file when this program eventually exits/dies and has to be restarted for whatever reason.

      And, yeah, I'm not surprised by the locking issues you ran into. Windows has always enforced (IMO ridiculous) restrictions on using locked files. It's the major reason for all the "you changed something - please reboot to continue" nonsense that people make jokes about. If it's locked, Windows won't allow it to be changed and, based on my reading of the Tie::File docs, it only updates the file's size and array when it is tied or flocked (which is perfectly reasonable, since it shouldn't be spending all its time constantly re-reading the file, just in case it might have changed).

      (Note that I have not actually run the code posted above, but I recently did something extremely similar, so it should work, just so long as Win32 Perl supports seek/tell properly.)