in reply to Reading a run length encoded file in a buffering scenario


I still prefer the unpack 'C/a' method, not least of all for speed. How about something like this:
#!/usr/bin/perl -wl my $buffer = "\04perl\03awk\01C"; # Try this incomplete string # my $buffer = "\04perl\03awk\01C\6pytho"; while ((my $size = ord $buffer) < length $buffer) { print unpack 'C/a', $buffer; substr $buffer, 0, $size +1, ''; } print "<$buffer>";

This will leave any incomplete items in the buffer.

--
John.

Replies are listed 'Best First'.
Re^2: Reading a run length encoded file in a buffering scenario
by particle (Vicar) on Aug 14, 2002 at 14:34 UTC
    This will leave any incomplete items in the buffer.

    not exactly. more like: if the last item is incomplete, it will remain in the buffer.

    there's an important distinction. if any item other than the last in incomplete, every item from the incomplete item to the last item (inclusive) will be corrupt. there may or may not be data remaining in the buffer as well.

    ~Particle *accelerates*


      there's an important distinction. if any item other than the last in incomplete, every item from the incomplete item to the last item (inclusive) will be corrupt. there may or may not be data remaining in the buffer as well.

      I think that this comment serves to confuse rather than clarify.

      There is no important distinction to be made here. If the data is corrupt then all decoding schemes will fail.

      If the data isn't corrupt then having the incomplete item remain in the buffer is an advantage. It means that the program can add to the buffer until at least one record is read.

      --
      John.

Re: Re: Reading a run length encoded file in a buffering scenario
by demerphq (Chancellor) on Aug 15, 2002 at 08:29 UTC
    Well I used your code, with some modifications to produce:
    sub read_rle_file {#read run length encoded file my $filespec=shift; my $numfields=shift; my $sub=shift; my $sha1=Digest::SHA1->new(); my $IN_IO; print "Reading run length encoded file $filespec, with records of +$numfields fields.\n" if $Debug; if ( $filespec =~ /\.gz/ ) { $IN_IO = IO::Zlib->new( $filespec, "rb" ) or die "Cannot open compressed run length encoded file $fi +lespec : \n" ; } else { $IN_IO = IO::File->new($filespec) or die "Cannot open run length encoded file $filespec : \n +" ; binmode $IN_IO; } my $buffer=""; # The buffer we are using my $records=0; # Number of record we have read my $buffers=0; # The number of times we have refilled the buffer my $bytes =0; # The number of bytes we have read so far my $record=[]; # the array of records # read until the file is empty while (!$IN_IO->eof ) { my $read_buffer; my $bytesread = $IN_IO->read( $read_buffer, $Config{buffer_siz +e} ); die "Read error in read_rle_file($filespec,$numfields)\n" unless defined $bytesread; $bytes+=$bytesread; $sha1->add($read_buffer); $buffer.=$read_buffer; my @records; # try to extract as many records as possible from the buffer BUFFER: while ((my $len=ord($buffer)) < length $buffer) { push @$record,unpack("C/a",$buffer); substr($buffer,0,$len+1,""); if (@$record==$numfields) { push @records,$record; $record=[]; } } # hand off to the callback the records we have extracted so fa +r # we do this in chunks to save the callback overhead $sub->(\@records); $records+=@records; print "After buffer ".($buffers++)." read $records records fro +m $bytes bytes.\n" if $Debug>1; } die "Unprocessed data in buffer! read_rle_file($filespec,$numfield +s) failed!\n[@$record] $buffer\n" if @$record || length $buffer; return wantarray ? ($sha1->b64digest,$records) : $sha1->b64digest; }
    And Particles point is valid, but luckily in my situation im not worried about a corrupt file so much as an improperly terminated one. Also note the games with $record to handle when a buffer empties before a full record is complete, instead of pushing the incomplete record back into the buffer I now leave it in $record.

    Anyway, thanks for the feedback.

    Yves / DeMerphq
    ---
    Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)