comment on

Welcome to the Monastery.

"This uses the $/ input field separator, and then uses while (<>) to read a block at a time. I'd like to do this in pure perl, but I can't find a way."

Firstly, here's a simple example of how you might do this.

#!/usr/bin/env perl -l

use strict;
use warnings;

{
    local $/ = '';

    while (<DATA>) {
        chomp;
        print '--- One Block ---';
        print;
    }
} 

__DATA__
Block1 Line1
Block1 Line2
Block1 Line3


Block2 Line1
Block2 Line2
Block2 Line3



Block3 Line1
Block3 Line2
Block3 Line3

Block4 Line1
Block4 Line2
Block4 Line3
[download]

Notes:

Setting $/ to an empty string puts you in what's called "paragraph mode". This allows reading blocks (lines separated by one or more blank lines). The number of blank lines doesn't matter: note how that differs from '$/ = "\n\n"' which specifies an exact number of blank lines. See $/ in perlvar for further details.
When you modify '$/', or indeed any special variable, you should localise the change in limited scope so that the special variable works normally elsewhere in your code. In this instance, I've used an anonymous block (the code is within braces by themselves); subroutine definitions, BEGIN blocks, and so on, could work just as well: just keep the special variable modification separate from other code. See local, and the links that page provides, for more on this.
I'm reading using '<DATA>', which is just a handy way of reading the data after '__DATA__'. You could use '<$filehandle>', where that filehandle may come from open or some other source (see below).
For the purposes of demonstration, I've separated each block with a varying number blanks lines (specifically 2, 3, and 1). This is to show that the number of intervening blank lines doesn't matter when in paragraph mode.
See also chomp and -l in perlrun which I've used. Also look at say.

The output looks like this:

--- One Block ---
Block1 Line1
Block1 Line2
Block1 Line3
--- One Block ---
Block2 Line1
Block2 Line2
Block2 Line3
--- One Block ---
Block3 Line1
Block3 Line2
Block3 Line3
--- One Block ---
Block4 Line1
Block4 Line2
Block4 Line3
[download]

I thought ++roboticus had generally covered issues relating to '$/' and IO::Uncompress::Bunzip2; however, your reply seems to suggest you were looking for something else.

I'm not entirely sure what you're looking for. Note in IO::Uncompress::Bunzip2's Constructor section:

... the object, $z, returned from IO::Uncompress::Bunzip2 can be used exactly like an IO::File filehandle. This means that all normal input file operations can be carried out with $z. For example, to read a line from a compressed file/buffer you can use either of these forms

$line = $z->getline(); $line = <$z>;
[download]

Try using '<$z>', in a way similar to my example with '<DATA>', and see if that does what you want. Something like this (untested):

my $z = IO::Uncompress::Bunzip2::->new($filename);

{
    local $/ = '';

    while (<$z>) {
        ...        
    }
}
[download]

Note that the constructor code I've used differs from that shown in the IO::Uncompress::Bunzip2 documentation. This is on purpose and I recommend you use this instead. The IO::Uncompress::Bunzip2 documentation uses "Indirect Object Syntax: if you follow that link, you'll see in bold text

"... this syntax is discouraged ..."

along with a discussion of why that syntax should be avoided.

— Ken

In reply to Re: Searching large files a block at a time by kcott
in thread Searching large files a block at a time by JediWombat

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.