in reply to Re^4: Read multiple text file from bz2 without extract first
in thread Read multiple text file from bz2 without extract first

Why don't you compare the times yourself? The time depends on what is faster, decompressing and reading (CPU), or decompressing+writing+reading (IO). It also depends on whether you need to process the file more than once.

From Perl, you can directly decompress and read by using the pipe-open:

open my $fh, "bzip -cd $file |" or die "Couldn't open '$file': $!";

That is efficient if you only need to read the data once. If you need to read it more than once and have the disk space needed, decompressing once and then reading the decompressed file is likely faster.

Replies are listed 'Best First'.
Re^6: Read multiple text file from bz2 without extract first
by prescott2006 (Acolyte) on Apr 03, 2012 at 03:23 UTC
    Corion, when I run your code, it prompts that "bzip is not an internal command...." What am I missing? Actually I would like to read a txt inside a bz2 without extracting first and pattern matching the content with some keyword and output the result to an array or text file.
      I'm never sure whether the program name is bzip or bzip2. Use whatever the name of the program is. Also, you will need to have that program available in the PATH.