in reply to Searching large files a block at a time

JediWombat:

It looks like IO::Uncompress::Bunzip2 handles the $/ variable just fine, so it shouldn't have any problem reading blocks:

#!env perl use strict; use warnings; use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error); $/="\n\n"; my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2") or die "Argh! $Bunzip2Error\n"; my $cnt=0; while (my $buff = $z->getline()) { ++$cnt; my $len = length($buff); print "BLOCK: $cnt\nLEN: $len\n$buff\n\n\n\n"; last if $cnt>10; }

So perhaps I'm misunderstanding your problem....

Nevermind what I wrote below. When I saw you asking about reading by blocks, I thought you meant fixed-sized blocks, not delimited as your code shows. Sorry about that.


You should be able to read fixed-size blocks like this:

#!env perl use strict; use warnings; use IO::Uncompress::Bunzip2 qw(bunzip2 $Bunzip2Error); #my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2", {BlockSize=>512}) #my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2", BlockSize=>512) my $z = new IO::Uncompress::Bunzip2("zzzzz.bz2") or die "Argh! $Bunzip2Error\n"; my $buff; my $cnt=0; while (my $status = $z->read($buff, 512)) { ++$cnt; my $len = length($buff); print "BLOCK: $cnt\nLEN: $len\n$buff\n\n\n\n"; last if $cnt>10; }

NOTE: The BlockSize argument in the constructor didn't work, I tried it both as a hashref and as just a couple extra arguments, as above. But the read method accepts a block size argument, so you can still read fixed sized blocks. If anyone sees what I did wrong on the constructor, I'd like to hear what it is.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Searching large files a block at a time
by JediWombat (Novice) on Aug 02, 2017 at 02:04 UTC
    Thanks Roboticus. I think I might have been unclear - what I want to avoid is reading line-per-line, as my code is very slow. I assume that's because of the while (my $buff = $z->getline()) loop, but feel free to correct me on this. Using this structure, my program takes a solid minute to run, whereas the shell script that does the same thing completes in a second or two. Maybe I could call bzcat from the system, and store its output in a variable? But I'm still not sure how to use while (<>) inside a full Perl program, when I'm not reading in from a pipe. Cheers, JW.
      Bunzip2's getline works just like <>; you can set $/ = "\n\n" to read in paragraph mode. It doesn't seem to be all that well optimized, though. You might try this:
      open my $BZ, "bzcat $file |"; while (<$BZ>) { ... }