ShaZe has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks!

I have a question related to the read/ write operations happening on a server. I currently have a basic script that log each time a name and a time in a file. If the time is lower than the ones in the list, they are moved higher in the list. I then save the list and close the file.

The code is functional and there isn't any error seen when the function is tested alone. However in a real case scenario, the file often appear incomplete. The file will end at a partially written name or time.

Do you have any idea what could be wrong?

Here's the code I have :

#!/usr/bin/perl -- ################################################ use strict; use warnings; use Fcntl qw(:flock :seek); print "Content-type: text/html\n\n"; UpdateInfo("testname1", "240"); sub UpdateInfo { my $name = $_[0]; my $time = $_[1]; my $file = "file.dat"; my $lock = "file.sem"; open SEM, ">$lock" or die "Can't write-open $lock: $!"; flock SEM, LOCK_EX; my $output; my @lines; if(-e $file) { open(LEVELINFO, $file); while (defined(my $line = <LEVELINFO>)) { chomp $line; push (@lines, $line); } close(LEVELINFO); $lines[0] = "header\t0\t0\n"; my $currentEntry = 0; for (my $i = 1; $i < @lines; $i++) { (my $currName, my $currTime) = split(/\t/, $lines[$i]); if($currName eq $name) { if($time < $currTime) { $currentEntry = $i; } } $lines[$i] = $lines[$i] . "\n"; } if($currentEntry != 0) {splice @lines, $currentEntry, 1;} my $boolAdded = 0; for (my $i = 1; $i < @lines; $i++) { (my $currName, my $currTime) = split(/\t/, $lines[$i]); if($boolAdded == 0 and $time < $currTime) { splice @lines, $i, 0, "$name\t$time\n"; $boolAdded = 1; } } if(@lines < 11 and $boolAdded == 0) { push @lines, "$name\t$ti +me\n"; } open(LEVELINFO, ">$file"); for my $i (0..$#lines) { print LEVELINFO $lines[$i]; if($i > 0) { $output = $output . "$lines[$i]"; } } close(LEVELINFO); } else { push @lines, "header\t0\t0\n"; push @lines, "$name\t$time\n"; open(LEVELINFO, ">$file"); for my $i (0..$#lines) { print LEVELINFO $lines[$i];} close(LEVELINFO); } close(SEM); unlink($lock); return $output; }
Let me know what you think

Replies are listed 'Best First'.
Re: Read / Write Server
by roboticus (Chancellor) on Mar 04, 2015 at 21:46 UTC

    ShaZe:

    It looks like your program is monitoring a file that a different program generates. I suspect that the other program doesn't flush its buffers after writing, so the data in the file is simply incomplete at the time you read it. You can demonstrate it with these two programs. The first program writes a line of text to a file every second:

    $ cat file_gen.pl #!/usr/bin/env perl use strict; use warnings; open my $FH, '>', 'a_file.txt'; $|=1; for (1 .. 1000) { print $FH "$_: here's 80 stars: ", "*"x80, "\n"; my $t = time; print "$t: line: $_\n"; sleep 1; }

    The second program reads the file every 5 seconds, and dumps its contents to the console:

    #!/usr/bin/env perl use strict; use warnings; while (1) { my $t = time; print "$t: Here's what we have so far:\n"; open my $FH, '<', 'a_file.txt'; my @lines = <$FH>; close $FH; print join("", @lines),"\n"; sleep 5; }

    If you run the file generator in one terminal, and the file monitor in the other monitor, you'll see that many lines are written by the first program, while the monitor program doesn't see anything for a while. After a while, the first program will write enough data to the output buffer that the file finally gets updated, and the next time the file monitor does its thing, you'll see a large number of lines suddenly appear in the file. It's quite likely that the last line it shows you is a part of a line, though, unless you get lucky and the end of the line happens to correspond with the end of a buffer.

    For details on buffering, read perldoc perlvar in the section "Variables related to filehandles" and also IO::Handle for information on file buffering as well as other operations you can do with files.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      It seems you are right. The file seems to be fine on the server side however the other program used for monitoring is not seeing the updated file.

      Taking this new information into account, I am assuming it would be a good approach to always force flush after writing sensitive data to make sure they are saved in case of a sudden interruption? That's assuming the buffer never fully write to the file if it is terminated earlier.

      Thank you very much for the help! I guess I now have to tweak a few things now to make sure this kind of data lost never occur ever again.

Re: Read / Write Server
by Laurent_R (Canon) on Mar 04, 2015 at 22:54 UTC
Re: Read / Write Server
by cheako (Beadle) on Mar 05, 2015 at 05:12 UTC
    Try this.
    1. There is a file, fully written and readable.
    2. Along come the readers and life is good.
    3. Jill adds something to the end of the file, but Bob only sees half.
      • Bob does not see the trailing EOL, so he is well aware the file is truncated.
    4. Bob keeps reading, IO::Select until he is content or gives up timing out and handling the error. Ignoring partial data, ect.
      • In many cases Bob can just ignore the last record, it's likely he will read it later anyway.
    5. Bob needs to edit the file, cleaning up after Jill or anything else that's not appending to the file.
    6. Bob safely creates a new file with a similar name that's always used when updating the file.
    7. Bob fills the file till it's complete, reading and copying an newly opened original.
    8. Bob closes the new file, flushing it to disk.
    9. Bob renames the new file ontop of the original.
    10. Bob looks at the original(still open) for any additions till he is satisfied that he can quite.
      • fork and Select with time out is good for this.
    11. There is a file, fully written and readable...

    Note: Bob will need to set the correct group and group writable permissions as well as have write access to the folder containing the file. The temp file should be in the same folder as the original, because this does not work across mount points.

    The above is completely Atomic and works for line buffered files. By Atomic we mean that a reader sees one or the other versions of the file. If it's imperative that the reader always has the freshest data, then you need to use file locks. If using file locks consider using binary data as you'll incur less buffering overhead that is not doing any good.

    Keep in mind after step 9, especially step 10, step 6 is accessible to anyone. During step 10 it's imperative that any new updates will be able to pass down a chain of several other updates. For this reason it may be a good idea to skip step 3 and 4, there is a lot of added complexity if there isn't a clear gain from allowing appends to not be done in a new file.
Re: Read / Write Server
by ShaZe (Novice) on Mar 04, 2015 at 23:16 UTC
    I'm confused now. I just read that closing the filehandle automatically flush the IO Buffer. Also, the new ways of flushing the buffer all use a different approach from opening files for writing or reading. My approach was supposed to be the most secure way of doing it, I am unsure if it is still the case or if I am just completely outdated.