My musings on bit streams were thoroughly covered here and here, but just to brink it back on track:

I need to see a file as a bit stream. It should be opened, and asked for bits - get_bits(howmany), not in chunks of bytes or words, but BITS. I.e. I may ask it for 1 bit, for 17 bits, etc.

One solution (the simplest and fastest) to represent such a stream is by a string of 1s and 0s, that is read from the file once and unpack()-ed. But there's a problem with this approach: files may get huge (GBs), and memory usage is a problem. Holding such strings in memory is impossible.

The other solution is slower, but unlimited in memory. Keep a buffer of some length (preferably long), and when a request gets beyond the current buffer, fetch the next one and adjust.

Today I hit the memory problem hard to I implemented the second solution. I'd like to kindly ask my fellow monks for advice and guidance - can this be made faster ? I need the fastest get_bits() function possible. Here is the constructor and the get_bits function of the BitStream object:

# buffer size, in bytes use constant BUF_SIZE => 2048; # Constructed with a filename # sub new { my $filename = $_[0]; open(FH, "$filename") or die "$myname error: unable to open $filen +ame: $!\n"; binmode(FH); my $filehandle = *FH; my $bytes_buf; my $bytes_read = read(*FH, $bytes_buf, BUF_SIZE); my $bits_buf = unpack("B*", $bytes_buf); # Members: # # FILENAME # FILEHANDLE # CUR_BUF - the current buffer held in memory (a bitstring) # CUR_BUF_LEN - length of the current buffer (in bits) # BUF_NUM - the first buffer in the file is 0, the next 1, and so +on # BUF_POS - position inside the current buffer # my $self = {}; $self->{FILENAME} = $filename; $self->{FILEHANDLE} = $filehandle; $self->{CUR_BUF} = $bits_buf; $self->{CUR_BUF_LEN} = length($bits_buf); $self->{BUF_NUM} = 0; $self->{BUF_POS} = 0; print "$myname ($filename) created\n"; bless($self, $myname); } # Gets a specified amount of bits. Default is 1 # If the request is for more bits than left in the stream, returns the + ones # left; an empty string is returned when the stream ends # sub get_bits { my $self = shift; my $howmany = (defined $_[0]) ? $_[0] : 1; my $ret_str = ""; ($howmany <= (BUF_SIZE * 8)) or die "Please read in chunks no longer than ", BUF_SIZE * 8, +" bits\n"; my $n_bits_left_in_buf = $self->{CUR_BUF_LEN} - $self->{BUF_POS}; # the request is over the buffer's end ? if ($n_bits_left_in_buf < $howmany) { #~ print "$self->{CUR_BUF} $self->{BUF_POS} $n_bits_left_in_bu +f\n"; # take what's left in the buffer $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $n_bits_ +left_in_buf); my $howmany_left = $howmany - $n_bits_left_in_buf; # read the next buffer my $bytes_buf; my $bytes_read = read($self->{FILEHANDLE}, $bytes_buf, BUF_SIZ +E); # was the current buffer the last in the file ? if (($self->{CUR_BUF_LEN} < BUF_SIZE * 8) or ($bytes_read == 0)) { # then we just read the last bits of the file. returning a + string shorter # than $howmany signals to the caller that the stream ende +d # return $ret_str; } # update buffer info $self->{BUF_NUM} += 1; $self->{CUR_BUF} = unpack("B*", $bytes_buf); $self->{CUR_BUF_LEN} = $bytes_read * 8; #~ print "> $self->{CUR_BUF} $self->{CUR_BUF_LEN} $howmany_lef +t\n"; # complete the read from the new buffer $ret_str .= substr($self->{CUR_BUF}, 0, $howmany_left); $self->{BUF_POS} = $howmany_left; } else # the request still fits the current buffer { $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $howmany +); $self->{BUF_POS} += $howmany; } return $ret_str; }
Notes:

In reply to BitStream revisited by spurperl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.