Re^2: PerlIO file handle dup

Greetings,

To decrease the number of trips to and from the shared-manager, one can provide a suffix (k * 1024) or (m * 1024 * 1024) for the 3rd argument to read. That there enables chunk IO. Not to worry, the shared-manager completes reading until reaching the end of line or record. Notice $. It is the chunk_id, not the actual line number. The chunk_id value is important when output order is desired.

OP's script involving semaphore + yield: 3.6 seconds. Shared handle (non-chunking): 1.1 seconds.

Below, chunking completes in 0.240 seconds which is the total running time including initial gzip.

use strict;
use threads;
use MCE::Shared;

{
    open my $fh, '|-', 'gzip > test.txt.gz';

    foreach (1..100000) {
        print {$fh} sprintf('%04d',$_).('abc123' x 10)."\n";
    }

    close $fh;
}

{
    mce_open my $fh, '-|', 'gzip -cd test.txt.gz' or die "open error: 
+$!\n";
    mce_open my $out, '>', \*STDOUT or die "open error: $!\n";

    my @thrs;

    foreach (1..3) {
        push @thrs, threads->create('test');
    }

    $_->join() foreach @thrs;

    close($fh);

    sub test {
        my $tid = threads->tid();
        # using shared output to not garble among threads
        while (1) {
           my $n_chars = read $fh, my($buf), '4k';
           last if (!defined $n_chars || $n_chars <= 0);
           print {$out} "## thread: $tid, chunkid: $.\n".$buf;
        }
    }
}
[download]

Regards, Mario.

Comment on Re^2: PerlIO file handle dup Download Code