steves has asked for the wisdom of the Perl Monks concerning the following question:

This one is stumping me. I have a very simple driver script that literally looks like this:

!/usr/local/bin/perl use strict; use Our::UI::CommandLine; print ">> [PID $$] calling ui_run\n"; Our::UI::CommandLine->ui_run();
The print statement is sometimes issued twice with the same PID! Any idea how that can be? There's a lot of code under the package being invoked. I checked AUTOLOAD paths and have found nothing there ... but since there's no sub, I would expect to find nothing.

This is on Solaris, Perl 5.6.1. It seems to happen mostly or maybe fully when I cron it. I'm running more definitive tests for that now. It also goes away if I short-circuit big calling paths below, but those are so deep it would probably take days or weeks to narrow it down that way. Stack traces, using caller() are identical. Any ideas?

Replies are listed 'Best First'.
Re: main script invoked twice
by etcshadow (Priest) on Mar 29, 2004 at 00:15 UTC
    Try setting $|=1;.

    My guess is that somewhere inside of Our::UI::CommandLine there is a fork call. For details, look at perldoc -f fork, but the basic idea is that your print statement just moves that text into a buffer, not directly into the file log file your redirecting from stdout... when that fork happens, the buffer is copied to the child process, and so later on both the parent of the fork and the child of the fork flush the buffer, and whatever was in the buffer at the time of fork gets flushed twice.

    ------------ :Wq Not an editor command: Wq

      But fork would yield a different process ID. It's not running long enough for PIDs to wrap and the PID is always the same. But I think the exec hint holds water. I know we have at least one piece of code that calls exec to re-start the program after setting environment variables that need to be set before shared libraries are loaded. I bet that's it ... and that would explain why it only happens from cron: those environment variables are normally set in user shells.

        Read etcshadow's reply again. And try his suggestion. If it works, then the child's pid is irrelevant; you have a buffering problem. When running under cron, you'll be sending the output to a pipe, which may buffer more than sending it to a tty.

        If setting $|=1 fixes the problem but you don't want to disable buffering, use { local $| = 1 } (including the curlies) to force a flush at the appropriate point -- probably just before the fork. ($|=1;$|=0 would do the same thing but wouldn't restore the buffering status, which is an unfriendly thing to do, especially if you're mixing in anyone else's code.)

        Yes, fork would yield a different process ID, but that doesn't matter... the child would not be re-executing the print $$;... there is hidden behavior (called buffering) underlying the standard print operation which causes something more analogous (not exacly the same) to this:
        print "PID is $$"; fork(); ... # that is (or can be) essentially the same as: $buffer = ">> PID is $$"; fork(); print $buffer;
        so the both the parent and the child process are printing the value as it was computed in the parent process.

        Get it?

        Anyway, after looking at what you just had to say about the script re-exec'ing itself, it seems obvious that what is happening is not caused by the phenomenon I was describing (the pid getting caught in the output buffer at time of fork and thereby getting printed twice). The problem is that you are exec'ing the same script again from within the same process (so it's the same pid, even though it's a fresh run of the program).

        Understand how exec works: it replaces the contents of the current process with a new program... but it retains the same process (hence the same pid). Exec is, essentially, loading the new program into the current process's memory and calling goto(begin). Now, that means that if you exec the same program that you call you exec from, then it's essentially just pressing the reset-button on your program (but, again, from within the same process, hence the same pid).

        Hope that helps.

        ------------ :Wq Not an editor command: Wq
Re: main script invoked twice
by hawtin (Prior) on Mar 29, 2004 at 00:12 UTC

    There are three obvious cases you need to look out for.

    First: it could be that the program is running the script inside itself (look carefully in the Our::UI::CommandLine->ui_run() routine and see if it could be running itself again. (for example by an exec call)

    Second: The process numbers of Solaris processes "wrap round", that is once you have enough processes running the next allocated process number starts at the bottom again (ommitting numbers that are active at the time of course).

    Third: The value of the $$ variable could be being modified by the "use Our::UI::CommandLine;" line. For example it could be set in the BEGIN block.

      Third: The value of the $$ variable could be being modified by the "use Our::UI::CommandLine;" line. For example it could be set in the BEGIN block.

      Erm, that's not exactly possible.

      $ perl -le 'print $$; $$ = 1234; print $$' 11142 Modification of a read-only value attempted at -e line 1.

      Underneath $$ is a call to getpid(2) and hence the value can't be modified directly.

Re: main script invoked twice
by steves (Curate) on Mar 29, 2004 at 04:21 UTC

    I see your point about the buffering. I initially ignored that not only because the exec fit, but because one of the first things the first method called does is unbuffer output.

    So here's my follow-up: is there another way to deal with the problem the exec's are trying to solve? The issue there is that we want the Perl packages to be as self-contained as possible. If a package relies on an environment setting that gives it information needed to dynamically load shared libraries then the setting occurs too late. Our answer was to put the environment settings in BEGIN blocks, and have any that know they have the dynamic load dependency re-execute the script after the environment is set. This worked until we also introduced some dynamically loaded Perl modules -- at the point those BEGIN blocks are called now, we could be in the middle of processing and not truly at the beginning of the entire program execution. Any ideas there?