in reply to Re: IO::Uncompress::Gunzip thread safe?
in thread IO::Uncompress::Gunzip thread safe?

It was just the simplest way I can reproduce the problem. The actual script will start a thread for each chunk, enqueue the thread handle, then continue reading, start another thread, queue the thread handle, etc. The number of threads are limited with a semaphore, down'ed with each thread creation. Each thread performs some work on each record in the chunk it was sent. A separate output thread will dequeue the thread handle, join it receiving the processed output as a returned array reference, up the semaphore, and write the output to a file in the same order it was read. It works quite well with uncompressed input.

As a workaround for the compression, I had started a thread before loading the compression libraries, then used a queue to send the data to that thread from the input thread, but it was MUCH SLOWER.

Unfortunately I had to disable support for compressed input. Compressed output still works since that thread doesn't start any other threads.

  • Comment on Re^2: IO::Uncompress::Gunzip thread safe?

Replies are listed 'Best First'.
Re^3: IO::Uncompress::Gunzip thread safe?
by Corion (Patriarch) on Nov 21, 2016 at 20:42 UTC

    If you can make use of multiple CPUs, it might be easier to handle the decompression through an external process, at the cost of more inter-proces IO:

    open my $fh, "gzip -cd $file |" or die "Couldn't read from '$file': $! / $?"; binmode $fh; while (<$fh>) { # or whatever loop mechanism is appropriate ... }

    That way you lose some finer grained control over the error states - for zero-byte files, gzip might just exit and not output anything and your program might think everything is OK, for example.

      That works pretty well on Linux. It won't work on Windows, but better than nothing. Thanks!
        It won't work on Windows,

        Why not?

        C:\test>dir *.gz Volume in drive C is Local Disk Volume Serial Number is 8C78-4B42 Directory of C:\test 30/12/2014 13:59 228,720 01mailrc.txt.gz 20/03/2009 15:58 73,773,666 chr1.fa.gz 20/03/2009 16:02 11,549,785 chr21.fa.gz 20/03/2009 15:59 54,997,756 chr6.fa.gz 21/11/2016 19:03 13,203 test.gz 5 File(s) 140,563,130 bytes 0 Dir(s) 366,120,411,136 bytes free C:\test>p1 [0]{} Perl> open GZ, 'gunzip -c test.gz |' or die $!; print while <GZ> +;; 1 qwertyuiopasdfghjklzxcvbnm 2 qwertyuiopasdfghjklzxcvbnm 3 qwertyuiopasdfghjklzxcvbnm 4 qwertyuiopasdfghjklzxcvbnm 5 qwertyuiopasdfghjklzxcvbnm 6 qwertyuiopasdfghjklzxcvbnm 7 qwertyuiopasdfghjklzxcvbnm 8 qwertyuiopasdfghjklzxcvbnm 9 qwertyuiopasdfghjklzxcvbnm ...

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I've used this approach on both, Windows and unixish OSes with great success. Where did it fail for you and how?