Possible issue with read() in (some builds of) 5.8.0?

dsheroh has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a known issue, but I have not been able to locate any information on it.

I've written a set of utilities to do some per-host auditing of inactive user accounts. User activity is determined based on parsing each host's wtmp file¹ using read and unpack². Determining the wtmp formats in use took a little trial and error, but I quickly had it working properly on a test host.

Then I sent it off to some other hosts for further testing and it all fell apart...

Minimal test code to demonstrate how I'm parsing wtmp:

#!/usr/bin/perl -w

use strict;

my $wtmp_fmt = 'x44 A32 x264 l x40';
open(WTMP, './wtmp') || die "Unable to open wtmp file: $!\n";

my $rec;
my $rec_size = length pack ($wtmp_fmt, ());
my $reccount = 0;
while (read(WTMP, $rec, $rec_size)) {
    $reccount++;
    print length $rec, ' - ';
    if (length $rec < $rec_size) {
        print "<incomplete record>\n";
        last;
    }
    my @data = unpack($wtmp_fmt, $rec);
    my $name = $data[0] || '<no name>';
    my $logtime = localtime($data[1]) || '<no time>';
    print "$name\tat $logtime\n";
}
close WTMP;
print "$reccount records read\n";
[download]

And some sample output, generated using identical copies of a single wtmp file:

The original development host, running Red Hat AS4 with perl-5.8.5-12:

384 - root      at Wed May  3 11:42:48 2006
384 - <no name> at Wed May  3 11:42:49 2006
384 - admin_esper       at Mon May  8 13:01:38 2006
384 - <no name> at Mon May  8 13:01:40 2006
384 - root      at Mon May  8 14:45:49 2006
...
384 - root      at Mon May 22 10:15:37 2006
108 records read
[download]

The initial problem host, running Red Hat AS3 with perl-5.8.0-90.4:

384 - root      at Wed May  3 11:42:48 2006
384 - <no name> at Fri Jan  2 00:59:44 1970
384 - sper^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@smspap1      at Tu
+e Oct 26 10:21:36 1999
384 - <no name> at Wed Dec 31 18:00:00 1969
384 - ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@smspap100l.xxxxxx   at Wed Dec 31 
+18:00:00 1969
...
384 - }í        at Wed Dec 31 18:00:00 1969
79 - <incomplete record>
108 records read
[download]

And an HP-UX host, also running Perl 5.8.0, binary build 806 for PA-RISC from ActiveState:

384 - root      at Thu Aug 30 22:55:32 1906
384 - <no name> at Wed Mar 13 03:15:48 1907
384 - admin_esper       at Sat Apr 18 15:48:52 1903
384 - <no name> at Wed May 11 00:29:24 1904
384 - root      at Fri Mar 25 04:51:48 1960
...
384 - root      at Wed Sep  8 00:48:36 1915
108 records read
[download]

(Note that the mangled dates in the HP-UX output are expected, due to PA-RISC having a different endianness than x86.)

Testing on other machines, correct results (modulo endianness problems in the date) are consistently obtained on Red Hat with 5.8.5 and HP-UX with 5.8.0, but consistently mangled on Red Hat with 5.8.0, builds 88.9 and 90.4 (the only RH 5.8.0 builds in use here). The mangled parsing appears to stem from read() consuming more than 384 bytes per read (generally in the range of 387-394), but it does so in a deterministic fashion - the mangled results are always identical across all affected hosts.

So... Are there any known (or suggested) workarounds for this, within the confines of a change management policy which prevents an upgrade to a newer version of perl at this time?

Footnotes:

¹ Collecting data from the output of last proved unsuitable due to log rotation (preventing it from displaying logins past the beginning of the month on some hosts) and because it truncates usernames to 8 characters.

² I am aware of at least one CPAN module which parses wtmp, but it is not usable in this case due to a combination of technical and procedural factors.

Update: Looks like it was caused by data being incorrectly parsed as unicode by read. I've changed the open call to open(WTMP, '<:raw', './wtmp') and it now works correctly.

Comment on Possible issue with read() in (some builds of) 5.8.0? Select or Download Code

Replies are listed 'Best First'.
Re: Possible issue with read() in (some builds of) 5.8.0? by BrowserUk (Patriarch) on May 22, 2006 at 16:29 UTC
As the incomplete records (in your examples), always appear at the end of the file, and read is defined to only read as much as is available (Ie. it won't wait in order to fulfill the requested read size), it suggests that the applications writing to the file are using buffered output and therefore do not always write complete records. The appropriate action would be to check the numbers of byte read (the return value from read), and if it is less than you have asked for, loop (with a short delay) using the optional offset parameter to read until a full record is available. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Possible issue with read() in (some builds of) 5.8.0? by dsheroh (Monsignor) on May 22, 2006 at 18:13 UTC
The example output is from running identical code (the code as posted) against identical data files (verified by comparing file size, cksum output, and md5sum output) which have been copied to a test location, not each host's live wtmp file. The odds of any program writing to them (or even having them open to write) while being parsed are essentially nil, but, since the question was raised, I have verified that they were not written to during my tests by checking that the file sizes are the same now as they were four hours ago when I re-ran the test while composing my post. The sample code provided prints the size of each record read, as determined by running `length` against the data grabbed by `read`. I have just now tested a modified version which also collects the return value of `read` and this value is, in all cases, identical to that obtained via `length`. The problem is not that the last `read` is short. The problem is that the prior `read`s, which are told to grab 384 bytes, place 384 bytes in the buffer, and give 384 as their return value (per your suggested test), are consuming more than 384 bytes from the input data, causing the position in the record to 'drift' to the right with each `read`. This causes the final `read` to be short because 305 bytes of it were already consumed by the second-to-last `read`, but, again, the short `read` is a symptom of the problem, not the problem itself. This 'drift' is also visible in the third record, where the mangled version finds the username as "sper^@^@^@..." rather than "admin_esper" as in the two good versions.	[reply] [d/l] [select]
Re^3: Possible issue with read() in (some builds of) 5.8.0? by BrowserUk (Patriarch) on May 22, 2006 at 20:02 UTC
... identical data files ... which have been copied to a test location, not each host's live wtmp file. Oh. That discounts that idea. My apologies. The other possibility that comes to mind as a result of your expanded description of the problem, is that some parts of the file are being interpreted as unicode with the result that multiple bytes are being read and treated as a single character. The way to verify that possibility is to ensure that the file is being treated as 'raw' using the 3-arg open: `open(WTMP, '<:raw', './wtmp') \|\| die "Unable to open wtmp file: $!\n";` [download] Or binmode `open(WTMP, './wtmp') \|\| die "Unable to open wtmp file: $!\n"; binmode( WTMP, ':raw' )` [download] Either way should tell you if this is the problem. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Possible issue with read() in (some builds of) 5.8.0? by dsheroh (Monsignor) on May 22, 2006 at 20:34 UTC