dsheroh has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a known issue, but I have not been able to locate any information on it.

I've written a set of utilities to do some per-host auditing of inactive user accounts. User activity is determined based on parsing each host's wtmp file1 using read and unpack2. Determining the wtmp formats in use took a little trial and error, but I quickly had it working properly on a test host.

Then I sent it off to some other hosts for further testing and it all fell apart...

Minimal test code to demonstrate how I'm parsing wtmp:

#!/usr/bin/perl -w use strict; my $wtmp_fmt = 'x44 A32 x264 l x40'; open(WTMP, './wtmp') || die "Unable to open wtmp file: $!\n"; my $rec; my $rec_size = length pack ($wtmp_fmt, ()); my $reccount = 0; while (read(WTMP, $rec, $rec_size)) { $reccount++; print length $rec, ' - '; if (length $rec < $rec_size) { print "<incomplete record>\n"; last; } my @data = unpack($wtmp_fmt, $rec); my $name = $data[0] || '<no name>'; my $logtime = localtime($data[1]) || '<no time>'; print "$name\tat $logtime\n"; } close WTMP; print "$reccount records read\n";

And some sample output, generated using identical copies of a single wtmp file:

The original development host, running Red Hat AS4 with perl-5.8.5-12:

384 - root at Wed May 3 11:42:48 2006 384 - <no name> at Wed May 3 11:42:49 2006 384 - admin_esper at Mon May 8 13:01:38 2006 384 - <no name> at Mon May 8 13:01:40 2006 384 - root at Mon May 8 14:45:49 2006 ... 384 - root at Mon May 22 10:15:37 2006 108 records read

The initial problem host, running Red Hat AS3 with perl-5.8.0-90.4:

384 - root at Wed May 3 11:42:48 2006 384 - <no name> at Fri Jan 2 00:59:44 1970 384 - sper^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@smspap1 at Tu +e Oct 26 10:21:36 1999 384 - <no name> at Wed Dec 31 18:00:00 1969 384 - ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@smspap100l.xxxxxx at Wed Dec 31 +18:00:00 1969 ... 384 - }í at Wed Dec 31 18:00:00 1969 79 - <incomplete record> 108 records read

And an HP-UX host, also running Perl 5.8.0, binary build 806 for PA-RISC from ActiveState:

384 - root at Thu Aug 30 22:55:32 1906 384 - <no name> at Wed Mar 13 03:15:48 1907 384 - admin_esper at Sat Apr 18 15:48:52 1903 384 - <no name> at Wed May 11 00:29:24 1904 384 - root at Fri Mar 25 04:51:48 1960 ... 384 - root at Wed Sep 8 00:48:36 1915 108 records read

(Note that the mangled dates in the HP-UX output are expected, due to PA-RISC having a different endianness than x86.)

Testing on other machines, correct results (modulo endianness problems in the date) are consistently obtained on Red Hat with 5.8.5 and HP-UX with 5.8.0, but consistently mangled on Red Hat with 5.8.0, builds 88.9 and 90.4 (the only RH 5.8.0 builds in use here). The mangled parsing appears to stem from read() consuming more than 384 bytes per read (generally in the range of 387-394), but it does so in a deterministic fashion - the mangled results are always identical across all affected hosts.

So... Are there any known (or suggested) workarounds for this, within the confines of a change management policy which prevents an upgrade to a newer version of perl at this time?

Footnotes:

1 Collecting data from the output of last proved unsuitable due to log rotation (preventing it from displaying logins past the beginning of the month on some hosts) and because it truncates usernames to 8 characters.

2 I am aware of at least one CPAN module which parses wtmp, but it is not usable in this case due to a combination of technical and procedural factors.

Update: Looks like it was caused by data being incorrectly parsed as unicode by read. I've changed the open call to open(WTMP, '<:raw', './wtmp') and it now works correctly.

Replies are listed 'Best First'.
Re: Possible issue with read() in (some builds of) 5.8.0?
by BrowserUk (Patriarch) on May 22, 2006 at 16:29 UTC

    As the incomplete records (in your examples), always appear at the end of the file, and read is defined to only read as much as is available (Ie. it won't wait in order to fulfill the requested read size), it suggests that the applications writing to the file are using buffered output and therefore do not always write complete records.

    The appropriate action would be to check the numbers of byte read (the return value from read), and if it is less than you have asked for, loop (with a short delay) using the optional offset parameter to read until a full record is available.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The example output is from running identical code (the code as posted) against identical data files (verified by comparing file size, cksum output, and md5sum output) which have been copied to a test location, not each host's live wtmp file. The odds of any program writing to them (or even having them open to write) while being parsed are essentially nil, but, since the question was raised, I have verified that they were not written to during my tests by checking that the file sizes are the same now as they were four hours ago when I re-ran the test while composing my post.

      The sample code provided prints the size of each record read, as determined by running length against the data grabbed by read. I have just now tested a modified version which also collects the return value of read and this value is, in all cases, identical to that obtained via length.

      The problem is not that the last read is short. The problem is that the prior reads, which are told to grab 384 bytes, place 384 bytes in the buffer, and give 384 as their return value (per your suggested test), are consuming more than 384 bytes from the input data, causing the position in the record to 'drift' to the right with each read. This causes the final read to be short because 305 bytes of it were already consumed by the second-to-last read, but, again, the short read is a symptom of the problem, not the problem itself. This 'drift' is also visible in the third record, where the mangled version finds the username as "sper^@^@^@..." rather than "admin_esper" as in the two good versions.

        ... identical data files ... which have been copied to a test location, not each host's live wtmp file.

        Oh. That discounts that idea. My apologies.

        The other possibility that comes to mind as a result of your expanded description of the problem, is that some parts of the file are being interpreted as unicode with the result that multiple bytes are being read and treated as a single character.

        The way to verify that possibility is to ensure that the file is being treated as 'raw' using the 3-arg open:

        open(WTMP, '<:raw', './wtmp') || die "Unable to open wtmp file: $!\n";

        Or binmode

        open(WTMP, './wtmp') || die "Unable to open wtmp file: $!\n"; binmode( WTMP, ':raw' )

        Either way should tell you if this is the problem.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.