http://qs1969.pair.com?node_id=662828

Ryszard has asked for the wisdom of the Perl Monks concerning the following question:

I have this script that is using threads, queues and a shared variable. once the queue is empty, i have a few threads that hang, and hence the program never finishes.

the script is actually quite simple, reading from a file (which happens to be IP addresses) queuing said data, then in each thread, dequeuing an ip address and doing an snmpget on it.

i can help but think i'm missing something fundamental here.

#!/usr/bin/perl -w use strict; use threads; use threads::shared; use Thread::Queue; my $q = new Thread::Queue; # create empty threaded queue my $threadcount = 10; # amount of threads my @SNMPthreads; # thread tracker my @firmware : shared; open (FH, 'filename.txt'); while (<FH>) { chomp; $q->enqueue($_); } close FH; foreach my $i (1..$threadcount) { push @SNMPthreads, threads->new(\&SNMPthread, $i ); } foreach my $i (0..$#SNMPthreads) { my $retval = $SNMPthreads[$i]->join(); } open (WH, ">firmware.txt"); print WH "-->$_\n" foreach @firmware; close WH; sub SNMPthread { my $thread = shift; while ( $q->pending && $q->pending > 0) { my $left = $q->pending; my $ip = $q->dequeue; push @firmware, "$$: $thread, $left, $ip"; my $retval = `/usr/bin/snmpget -v1 -c <commstring> $ip sysDesc +r.0`; chomp $retval; $retval =~ s/\>\>//g; my @ele = split(/\;/, $retval); print "$ip, $ele[ ($#ele-1) ], $ele[ $#ele ]\n"; } print "thread $thread finished\n"; }
top complete the picture on the environment i'm in:
Summary of my perl5 (revision 5 version 8 subversion 7) configuration: Platform: osname=linux, osvers=2.6.14.3, archname=i486-linux-gnu-thread-mult +i uname='linux ninsei 2.6.14.3 #1 smp preempt mon nov 28 19:51:50 ps +t 2005 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dccc +dlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/ +share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvend +orlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/ +local -Dsitelib=/usr/local/share/perl/5.8.7 -Dsitearch=/usr/local/lib +/perl/5.8.7 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/ma +n3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man +3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs - +Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.7 -Dd_ +dosuid -des' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS +-DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE +_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN +-fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='4.0.3 20051201 (prerelease) (Debian 4.0. +2-5)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +2 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.5.so, so=so, useshrplib=true, libperl=libperl.s +o.5.8.7 gnulibc_version='2.3.5' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL_IMPLICIT_CONTEXT Locally applied patches: SPRINTF0 - fixes for sprintf formatting issues - CVE-2005-3962 Built under linux Compiled at Dec 15 2005 17:19:17 @INC: /etc/perl /usr/local/lib/perl/5.8.7 /usr/local/share/perl/5.8.7 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl
and uname shows:  2.6.11.7-ibm26 #1 SMP The machine is a quadcore intel jobby.

is anyone able to offer advice on this hangage ?

Replies are listed 'Best First'.
Re: threads hang
by moritz (Cardinal) on Jan 17, 2008 at 10:03 UTC
    The dequeue from Thread::Queue blocks until there is an item to return.

    Now if you have a race condition between the query of $q->pending and $q->dequeue (another threads empties the queue in between) you can get a block.

    Try $q->dequeue_nb instead.

    (Disclaimer: I have no experience with threads, so don't be surprised if that's not the problem...)

Re: threads hang
by BrowserUk (Patriarch) on Jan 17, 2008 at 14:16 UTC

    I've managed to reproduce this on a single cpu windows machine with AS5.8.8 (using ping instead of smtp) and it appears to be a bug in the handling of backticks within threads.

    The symptoms I am seeing are that one thread hangs, never returning from the backticks, even though I can see that all instances of the executable (ping.exe) have terminated. Using this simplified version of your thread code with added trace:

    sub SNMPthread { my $thread = shift; while ( my $left = $q->pending ) { warn "$thread: $left"; my $ip = $q->dequeue; warn "$thread: $left, $ip"; my $return = `ping $ip`; warn sprintf "$thread: $left got %d bytes of output\n", length + $return; my @ele = split ' ', $return; warn "$thread: $left, $ip, $ele[ -4 ], $ele[ -1 ]"; } warn "thread $thread finished\n"; }

    The trace for the hanging thread is:

    8: 94 at C:\test\662828.pl line 30. 8: 94, 212.58.227.137 at C:\test\662828.pl line 32. 8: 94 got 471 bytes of output 8: 94, 212.58.227.137, 204ms,, 192ms at C:\test\662828.pl line 36. 8: 88 at C:\test\662828.pl line 30. 8: 88, 30.207.168.74 at C:\test\662828.pl line 32. 8: 88 got 228 bytes of output 8: 88, 30.207.168.74, =, loss), at C:\test\662828.pl line 36. 8: 78 at C:\test\662828.pl line 30. 8: 78, 202.229.106.211 at C:\test\662828.pl line 32.

    As you can see, each IP is producing 4 lines of trace. Two before the backticks and two after. On it's third attempt, the backticks hang. However, the spawned ping instance goes away, so it's internal to Perl rather than the executable where the hang is occuring. The latter also suggested by the fact I can reproduce it using a different executable under a different OS.

    I've tried substituting a piped open for backticks and varied the number of threads with the same result. Always one thread only that hangs. So it appears to be an internal problem that is the culprit.

    However, I do have a fix for you. Upgrade to 5.10.0. The same code completes successfully using it, which also tends to indicate that it's a Perl problem, that's been fixed already.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      dude, nice one!

      I wonder if what you've found tho' is perhaps a different issue, as what i have observed is subtly different to what you see.

      what i was seeing was a different number of threads that would hang each time i ran the script. if i ran the script with 10 threads i would get about 4-7 threads that would hang.

      thanks for your proposed solution and your effort in testing this out in multiple environments! unfort, in a production env with multiple 100's of thousands of customers (and this really being a one off deal) i wont have the opportunity to upgrade to 5.10.0. it is something to keep in mind for the future tho', and perhaps i can initiate a longer term initiative.

        if i ran the script with 10 threads i would get about 4-7 threads that would hang.

        Quite possibly symptoms of the same problem on a multi-cpu system rather than my single cpu system?

        You could try addressing the issue that moritz raised, by changing your loop to:

        while( my $left = $q->dequeue_nb ) {

        And see if that helps any. If not, try generating a standalone demo of the problem (using ping) and post a perlbug report. Maybe whomever there fixed the problem for 5.10 will recognise the symptom and be able to offer a patch against 5.8.8?

        Beyond that, I can't see any way of avoiding the bug, as it also affects piped opens. The alternative might be to use NET::SNMP to do whatever the executable your using is doing. Perhaps too painful to reproduce?

        One thing that route does have going for it is a non-blocking mode that appears to work quite well, which mean that you could avoid using threads altogether.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: threads hang
by zentara (Archbishop) on Jan 17, 2008 at 12:37 UTC
    A thread must reach the end of it's code block OR return, for it to end properly. I would suspect that
    my $retval = `/usr/bin/snmpget -v1 -c <commstring> $ip sysDescr.0`;
    is hanging, and stoping the flow of the code. Is there someway to set a timeout on it, or wrap it in an eval?

    Alarms don't work well in threads, but you could try running a parallel thread to act as a timer, and run the snmpget command thru a piped-open or IPC::Open3 so you get a pid that can be killed when the timer times out.


    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
      actually i guess snmpget has an implicit timeout as if i cant connect to a device, i get a timeout message on STDERR :-)
        Then run the backtick with `command 2>&1` and it shouldn't hang, as eventually it will get the stderr return.

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum