kingskot has asked for the wisdom of the Perl Monks concerning the following question:

Let me start by saying that I'm not a programmer. I'm a scientist. Perl is a tool that I can use to do my science, but I don't need to be an expert, write beautiful code, or find the optimal solution to every coding problem. I just want the quick and dirty answers that will allow me to do what I want to do in some semi-reasonable way.

That being said, here's my problem. I have a bunch of perl scripts that work just fine on my old linux box. Now, I have a shiny new desktop with two quad processors. The system install of perl is built with threading enabled by default. This is great for some scripts, but it causes others to die when a line of code that depends on results from a previous line gets executed before the previous line has finished its job (for instance, a system() call creates a file, but the code tries to access the file before the system call is done making it).

I don't want to install my own local version of perl, or have the syshelp people reinstall perl without threading, since I get a benefit in some cases. I also don't want to have to re-write all of my code line by line to make sure it's robust to threading. What is the most elegant and most SIMPLE solution here? Is there a way to run a perl script without threading "turned on" under thread-enabled perl? Do I really have to litter my scripts with statements like "if threading is enabled do this, otherwise do this" if I want them to run under threaded and non-threaded perl?

It seems like this should be simple, but I haven't had any luck so far, so any advice will be appreciated.

  • Comment on Runing "regular" code with threaded perl

Replies are listed 'Best First'.
Re: Runing "regular" code with threaded perl
by Corion (Patriarch) on Jul 10, 2008 at 06:57 UTC

    The problem is not with threading. If your program is written to use threads, you won't be able to run it on a version Perl without threads. If your program is not written to use threads, Perl won't use threads and all the concurrency problems you have come from other sources.

    It seems to me that your program is launching multiple external programs but is not properly synchronizing them. Without seeing the relevant code, it's hard to tell where your program goes wrong. I recommend looking at Parallell::ForkManager or at the simple runN by Dominus. Both approaches are discussed in Parallelization of heterogenous (runs itself Fortran executables) code.

    If you want/need to roll your own parallelisation, I recommend having one or more "queues" into which you put the jobs. Your master program then launches the subprograms to process the jobs in the queues and hopefully has simple enough logic to determine when a job in a queue further down below can be started.

      "If your program is not written to use threads, Perl won't use threads and all the concurrency problems you have come from other sources."

      What does this mean? Indeed, my program "was not written to use threads" in that is was developed and tested on a single processor machine. Yet the same code fails on a multi-processor machine with threading enabled. The actual code is simple: I make a system call to a routine that outputs a file, then reads the resulting file. The perl script is trying to access the file before the called routine is finished making it (if I load the code in a debugger and step through line by line it runs just fine). How does this translate to "perl not using threads"? Are you suggesting that it's a problem with the OS managing threads?

        What Corion means by "was written to use threads" is simply: does your script contain the statement:
        use threads;

        If not, the threading features of perl will not be used.

Re: Runing "regular" code with threaded perl
by ikegami (Patriarch) on Jul 10, 2008 at 07:22 UTC

    Whatever your problem is, threading support has nothing to do with it. It simply allows you to use use threads to create threads. It has no affect on how system works. If you don't create threads, the only difference with using a threaded build is a performance penalty.

    for instance, a system() call creates a file, but the code tries to access the file before the system call is done making it

    system won't return until the child exists and therefore after the child is done writing. Processes that don't exist can't write to files, so what you say makes no sense unless the child spawned a detached child and this detached process is actually doing the writing.

      Whatever your problem is, threading support has nothing to do with it.

      I agree. And I think that the bugs was present before, and only revealed by switching to a system with real concurrency (better operating system, multi core processor or both).

        I do not "use threads" in my code. Whether it makes sense or not (it certainly doesn't to me), the fact is that this works:

        system("xspec - ${tmp}.xcm");

        open(DAT, "${tmp}xsfit.dat") || die ("Could not open file!");

        whereas this does not:

        bjmsys("xspec - ${tmp}.xcm", $v);

        open(DAT, "${tmp}xsfit.dat") || die ("Could not open file!");

        where------------

        sub bjmsys {

        my $arg=shift;

        my $v=shift;

        $v=0 unless defined $v;

        my $status;

        print "$arg\n" if $v>1||$v<0;

        $status=system("$arg");

        print "return status = $status\n" if $v>1;

        return $status;

        }
Re: Runing "regular" code with threaded perl
by zentara (Cardinal) on Jul 10, 2008 at 11:34 UTC
    I have a shiny new desktop with two quad processors.

    As others have said, you are under a misconception about threads. Just because you have 2 processors, dosn't mean that all programs will automatically be run in some sort of shared-cpu manner. For that matter, even if you "use threads" in a Perl script, your kernel may decide to run all threads with one cpu. Multi-threading as you envision it, is done in the kernel, and it takes specialized c programs to utilize it to it's full potential. Your single or multi-threaded Perl program is at the mercy of the kernel, when it runs. Google for "linux multi cpu scheduling" and "linux multi cpu scheduling Perl" and you will see what is happening. If you are not an expert programmer( as is the case with most of us), you are jumping into very deep water, and you will conclude that it isn't worth the time learning to override the kernel's design, unless you are doing some extremely intensive number crunching on a super-computer. Is saving a few milliseconds of execution time worth the many hours of learning required to force dual-cpu usage?


    I'm not really a human, but I play one on earth CandyGram for Mongo
      On the contrary, I expected my perl scripts to run just as they did on my single-processor box on my multicore system, and that I'd have the option to add threads to them down the road if I so choose. I was confused when perl seemed to automatically be doing some kind of internal threaded, but as it turns out, it was an unrelated problem (see my post above). I'm happy to see that this works the way I would expect it to. Although I don't consider myself a programmer, I have experience writing massively parallel C code for cluster/supercomputers, so the concept of threading is not entirely foreign to me.
Re: Runing "regular" code with threaded perl
by cdarke (Prior) on Jul 10, 2008 at 08:20 UTC
    OK, lets look at other things that may have changed. The xspec application is an obvious one - does it fire off asynchronous tasks that might still be running after the "main" program completes? I know this is a horrible ugly hack, but try putting a sleep 4 after the system call before trying to open the file - this is for testing only, you would not want to leave the sleep there. It may expose a timing issue.

    Also consider if you have the same version of xspec running on both machines, they may behave differently.
      OK, this was a helpful comment. Apparently, xspec is silently dying, but only sometimes. So, if I stepped through the debugger, or called xspec outside of my perl script, seemed to run fine, but this test with the sleep command shows me that it sometimes randomly dies (I can make the same call over and over again at the command line, and most of the time it works, but sometimes it fails). So, as suggested, my blaming of threaded perl was misplaced. Xpsec is a pretty standard and well-tested piece of astrophysical data analysis software, so figuring out why it's dying on my new system will be another (non-perl related) problem.

        I'll bet you anything that the problem you are having with Xpsec is what you suspected perl of doing.

        If it works most of the time but only sometimes fails then the most likely problem is that it has troubles with multiple threads and fast CPU's that might actually be able to complete their tasks so fat that the original programmer never expected this to be possible and thus didn't bother to check for this.

        One good option would be to find a mailing list for: Xspec and ask if others find the same issues, check to see if there is a new release etc.

        For the time being try to do the following in your code:

        until ( -e "${tmp}xsfit.dat" ) { bjmsys("xspec - ${tmp}.xcm", $v); } open(DAT, "${tmp}xsfit.dat") || die ("Could not open file!"); my @kT=<DAT>; close DAT;

        This will basically make sure that the file exists and only then continue if the file does not exist it will simply try to create is again, and again until the file is there. It is a very crude way of working but it will certainly do the trick.

        As long as Xspec does not take for ever and ever before it fails you should be saved until the Xspec problem is resolved.

Re: Runing "regular" code with threaded perl
by pc88mxer (Vicar) on Jul 10, 2008 at 07:08 UTC
    Can you give us an example of a script which doesn't work on your new box but used to work on your old one?

    Is it possible that your new box is a multicore system, and now things are happening concurrently whereas before they were being performed sequentially? In any case, we'd probably need to see some code in order to tell you what's going on.

      Yes, that's what happening, the instructions aren't being carried out sequentially. It is a multi-core system, but why does this matter? If I run a c program it runs as a single thread, it seems to have something to do with how perl is running the script. I don't know if it will be much help, but here's the explicit code fragment that fails:

      bjmsys("xspec - ${tmp}.xcm", $v);

      open(DAT, "${tmp}xsfit.dat") || die ("Could not open file!");

      my @kT=<DAT>; close DAT;

      It dies when it tries to open the file because it hasn't been created yet.
        bjmsys("xspec - ${tmp}.xcm", $v);

        Now it would be interesting to know what this mysterious sub bjmsys does.

        I somehow suspect that it launches an external application in the background, while it really should just launch it and wait for it to finish.

        The problem seems to be with the bjmsys() function. What does it do and who wrote it?

        You should ask whoever is responsible for maintaining bjmsys() why it's returning before producing its output file on a multicore system.

        If it's a timing issue, you could just keep trying. Something like
        bjmsys("xspec - ${tmp}.xcm", $v); my $tries = 10; until ( open(DAT, "${tmp}xsfit.dat") || --$tries <= 0 ) { sleep 1; } die ("Could not open file: $!") if ($tries <= 0); my @kT=<DAT>; close DAT;
        Not quick, but dirty :)
Re: Runing "regular" code with threaded perl
by djp (Hermit) on Jul 11, 2008 at 01:38 UTC
    Your problem is you're not checking the return value of bjmsys(). Try:
    #bjmsys("xspec - ${tmp}.xcm", $v); bjmsys("xspec - ${tmp}.xcm", $v) == 0 or die ("bjmsys failed"); open(DAT, "${tmp}xsfit.dat") || die ("Could not open file!");
    BTW, for a scientist, your analysis of the problem was decidedly unscientific! :-).
      The return value from system is checked in the bjmsys subroutine depending on the value of the second parameter, $v.
      Can kingskot confirm that $v is being set to 1, at least for testing?
Re: Runing "regular" code with threaded perl
by NolanPL (Novice) on Jul 10, 2008 at 15:14 UTC
    As everybdy has said, your problem is not threading. An easy solution is to just sleep your program for as long as it takes for the file to be created. just something like sleep(3); while the file is being created where 3 is however long it takes will fix the problem, although its not great coding its an easy simple solution.
      Sleep is a bad hack that fails the moment the system runs what the program is waiting on slower. Please don't encourage sleep. The proper way to do it is to test for some condition.