in reply to drifting IO::File offset

I'm afraid I don't have any insight about the problem -- seems like it shouldn't be happening (the most hateful sort of bug).

So, the only thing happening between the "pausing" tell call and the "continuing" tell call is a few fork calls? I remember folks telling me recently that forked processes will share memory with the parent, but what you're seeing still should not happen.

If you conclude that forking is somehow triggering this behavior (and you can't convince the bankers to switch to Linux ;), then IMHO it would not be viewed as "klugey" to provide commentary in your code that mentions the apparent instability of IO::File offset pointers when used in combination with forking, call "tell" before the forking is done and even close the file; then reopen and seek after forking.

I can understand why you don't post a sample of your code in this case, but something to consider is to create a test-case script that you think might isolate the problem -- remove all "irrelevant" detail, and limit it to open file; while (whatever) {read 10K records; tell; fork...; tell }

If the most minimal script does not reproduce the problem, start adding in details from the target app. At some point, you'll find the thing in your code that you thought wasn't there or wasn't relevant, etc. (At least, one can hope...)

(update: the only other issue I could imagine bein relevant is to make sure you aren't doing anything that involves improper mixing of i/o styles -- e.g. if you're using getline and tell, you should not also be using any i/o function that starts with "sys". Of course, if you were, then I'd expect it to break under Linux as well.)

Replies are listed 'Best First'.
Re^2: drifting IO::File offset
by ezra (Scribe) on Sep 29, 2004 at 13:59 UTC
    So, the only thing happening between the "pausing" tell call and the "continuing" tell call is a few fork calls? I remember folks telling me recently that forked processes will share memory with the parent, but what you're seeing still should not happen.

    Actually, there's a ton of code between the pause and resume calls, but nothing that should be relevant to this filehandle. Trivial test scripts run fine, and this doesn't happen on identical or comparable hardware in other shops. The only changing variable in this scenario seems to be the client-specific configuration. That would influence things like the number of open file descriptors, number of open database handles, etc.

    I do use sysread/write in the same code suite for some unrelated socket interaction, but no other process or piece of code has any awareness of this object's filehandle. The object itself does gets shared across a bunch processes during the forks. Still, assuming that the forked process get CoW'd shared memory and each child gets a dupe of the open file table for that FD, and nobody moves the offset pointer, this shouldn't happening. AArGh.

    Anyway, I'll check for mixing I/O access methods just for sanity's sake. That's definitely a Good Thing to know about even if it doesn't solve this particular issue. Thanks for your time!

    Cheers,
    Ezra