Re^2: Possible issue with read() in (some builds of) 5.8.0?

The example output is from running identical code (the code as posted) against identical data files (verified by comparing file size, cksum output, and md5sum output) which have been copied to a test location, not each host's live wtmp file. The odds of any program writing to them (or even having them open to write) while being parsed are essentially nil, but, since the question was raised, I have verified that they were not written to during my tests by checking that the file sizes are the same now as they were four hours ago when I re-ran the test while composing my post.

The sample code provided prints the size of each record read, as determined by running length against the data grabbed by read. I have just now tested a modified version which also collects the return value of read and this value is, in all cases, identical to that obtained via length.

The problem is not that the last read is short. The problem is that the prior reads, which are told to grab 384 bytes, place 384 bytes in the buffer, and give 384 as their return value (per your suggested test), are consuming more than 384 bytes from the input data, causing the position in the record to 'drift' to the right with each read. This causes the final read to be short because 305 bytes of it were already consumed by the second-to-last read, but, again, the short read is a symptom of the problem, not the problem itself. This 'drift' is also visible in the third record, where the mangled version finds the username as "sper^@^@^@..." rather than "admin_esper" as in the two good versions.

Comment on Re^2: Possible issue with read() in (some builds of) 5.8.0? Select or Download Code

Replies are listed 'Best First'.
Re^3: Possible issue with read() in (some builds of) 5.8.0? by BrowserUk (Patriarch) on May 22, 2006 at 20:02 UTC
... identical data files ... which have been copied to a test location, not each host's live wtmp file. Oh. That discounts that idea. My apologies. The other possibility that comes to mind as a result of your expanded description of the problem, is that some parts of the file are being interpreted as unicode with the result that multiple bytes are being read and treated as a single character. The way to verify that possibility is to ensure that the file is being treated as 'raw' using the 3-arg open: `open(WTMP, '<:raw', './wtmp') \|\| die "Unable to open wtmp file: $!\n";` [download] Or binmode `open(WTMP, './wtmp') \|\| die "Unable to open wtmp file: $!\n"; binmode( WTMP, ':raw' )` [download] Either way should tell you if this is the problem. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: Possible issue with read() in (some builds of) 5.8.0? by dsheroh (Monsignor) on May 22, 2006 at 20:34 UTC
Excellent - worked like a charm! Unicode had crossed my mind (since it was introduced in 5.8.0, bugs in it seemed likely, plus it would explain why the bytes consumed varied from record to record), but I didn't see anything in the hex dumps of the data which looked likely to trigger interpretation as unicode. I'll have to remember that :raw... I've heard about 3-argument open as a security measure (to protect from user-entered filenames starting with ">", etc.), but this is the first time I've seen it used for anything more than that. Thanks again!	[reply]