chris212 has asked for the wisdom of the Perl Monks concerning the following question:

Is IO::Uncompress::Gunzip thread safe? I didn't see anything on perldoc to indicate it wasn't, but I'm not sure why else this code would crash on both Linux and Windows.
#!/usr/bin/perl use threads; use IO::Uncompress::Gunzip; use IO::Compress::Gzip; my $testfile = 'test.gz'; # create compressed file my $fh = new IO::Compress::Gzip($testfile); print {$fh} "$_ qwertyuiopasdfghjklzxcvbnm\n" foreach(1..5000); close($fh); print "$testfile created\n"; # read compressed file, starting a thread for every 500 lines $fh = new IO::Uncompress::Gunzip($testfile); my @chunk = (); while(my $line = <$fh>) { push(@chunk,$line); if(scalar(@chunk) == 500) { my $th = threads->create(\&test,\@chunk); @chunk = (); $th->join(); } } my $th = threads->create(\&test,\@chunk); $th->join(); close($fh); sub test { my ($lines) = @_; print foreach(@$lines); }

UPDATE:

We have restrictions on what software we can install (must be approved), so using a gzip executable on Windows isn't an option at the moment. Same with 7-zip if it even compresses a stream from STDIN. We ended up only using the script on Linux at the moment, so I'm just using the gzip command and compression will fail on Windows without it. If we get it approved on Windows, we can include the executable with the script, and it will be cross-platform.

Replies are listed 'Best First'.
Re: IO::Uncompress::Gunzip thread safe?
by BrowserUk (Patriarch) on Nov 21, 2016 at 19:11 UTC
    Is IO::Uncompress::Gunzip thread safe?

    I'm afraid the simple answer is no. And it would take some effort to make it so.

    That said, the way your code is constructed, it does not benefit from being threaded, because you are forcing the threads to run serially, by following the thread creation immediately with a call to join, which blocks until the thread ends.

    And as your posted code doesn't show what you are hoping to achieve by using threads -- just printing lines to the screen will never benefit from threading -- it is impossible to recommend a better approach.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.

      It was just the simplest way I can reproduce the problem. The actual script will start a thread for each chunk, enqueue the thread handle, then continue reading, start another thread, queue the thread handle, etc. The number of threads are limited with a semaphore, down'ed with each thread creation. Each thread performs some work on each record in the chunk it was sent. A separate output thread will dequeue the thread handle, join it receiving the processed output as a returned array reference, up the semaphore, and write the output to a file in the same order it was read. It works quite well with uncompressed input.

      As a workaround for the compression, I had started a thread before loading the compression libraries, then used a queue to send the data to that thread from the input thread, but it was MUCH SLOWER.

      Unfortunately I had to disable support for compressed input. Compressed output still works since that thread doesn't start any other threads.

        If you can make use of multiple CPUs, it might be easier to handle the decompression through an external process, at the cost of more inter-proces IO:

        open my $fh, "gzip -cd $file |" or die "Couldn't read from '$file': $! / $?"; binmode $fh; while (<$fh>) { # or whatever loop mechanism is appropriate ... }

        That way you lose some finer grained control over the error states - for zero-byte files, gzip might just exit and not output anything and your program might think everything is OK, for example.