Re: Searching large files a block at a time

JediWombat:

It looks like IO::Uncompress::Bunzip2 handles the $/ variable just fine, so it shouldn't have any problem reading blocks:

#!env perl
use strict;
use warnings;
use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error);

$/="\n\n";
my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2")
    or die "Argh! $Bunzip2Error\n";

my $cnt=0;
while (my $buff = $z->getline()) {
    ++$cnt;
    my $len = length($buff);
    print "BLOCK: $cnt\nLEN: $len\n$buff\n\n\n\n";
    last if $cnt>10;
}
[download]

So perhaps I'm misunderstanding your problem....

Nevermind what I wrote below. When I saw you asking about reading by blocks, I thought you meant fixed-sized blocks, not delimited as your code shows. Sorry about that.

You should be able to read fixed-size blocks like this:

#!env perl
use strict;
use warnings;
use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error);

#my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2", {BlockSize=>512})
#my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2", BlockSize=>512)
my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2")
    or die "Argh! $Bunzip2Error\n";

my $buff;
my $cnt=0;
while (my $status = $z->read($buff, 512)) {
    ++$cnt;
    my $len = length($buff);
    print "BLOCK: $cnt\nLEN: $len\n$buff\n\n\n\n";
    last if $cnt>10;
}
[download]

NOTE: The BlockSize argument in the constructor didn't work, I tried it both as a hashref and as just a couple extra arguments, as above. But the read method accepts a block size argument, so you can still read fixed sized blocks. If anyone sees what I did wrong on the constructor, I'd like to hear what it is.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re: Searching large files a block at a time Select or Download Code

Replies are listed 'Best First'.
Re^2: Searching large files a block at a time by JediWombat (Novice) on Aug 02, 2017 at 02:04 UTC
Thanks Roboticus. I think I might have been unclear - what I want to avoid is reading line-per-line, as my code is very slow. I assume that's because of the `while (my $buff = $z->getline())` loop, but feel free to correct me on this. Using this structure, my program takes a solid minute to run, whereas the shell script that does the same thing completes in a second or two. Maybe I could call bzcat from the system, and store its output in a variable? But I'm still not sure how to use while (<>) inside a full Perl program, when I'm not reading in from a pipe. Cheers, JW.	[reply] [d/l]
Re^3: Searching large files a block at a time by Anonymous Monk on Aug 02, 2017 at 03:56 UTC
Bunzip2's `getline` works just like `<>`; you can set `$/ = "\n\n"` to read in paragraph mode. It doesn't seem to be all that well optimized, though. You might try this: `open my $BZ, "bzcat $file \|"; while (<$BZ>) { ... }` [download]	[reply] [d/l]