I need to see a file as a bit stream. It should be opened, and asked for bits - get_bits(howmany), not in chunks of bytes or words, but BITS. I.e. I may ask it for 1 bit, for 17 bits, etc.
One solution (the simplest and fastest) to represent such a stream is by a string of 1s and 0s, that is read from the file once and unpack()-ed. But there's a problem with this approach: files may get huge (GBs), and memory usage is a problem. Holding such strings in memory is impossible.
The other solution is slower, but unlimited in memory. Keep a buffer of some length (preferably long), and when a request gets beyond the current buffer, fetch the next one and adjust.
Today I hit the memory problem hard to I implemented the second solution. I'd like to kindly ask my fellow monks for advice and guidance - can this be made faster ? I need the fastest get_bits() function possible. Here is the constructor and the get_bits function of the BitStream object:
Notes:# buffer size, in bytes use constant BUF_SIZE => 2048; # Constructed with a filename # sub new { my $filename = $_[0]; open(FH, "$filename") or die "$myname error: unable to open $filen +ame: $!\n"; binmode(FH); my $filehandle = *FH; my $bytes_buf; my $bytes_read = read(*FH, $bytes_buf, BUF_SIZE); my $bits_buf = unpack("B*", $bytes_buf); # Members: # # FILENAME # FILEHANDLE # CUR_BUF - the current buffer held in memory (a bitstring) # CUR_BUF_LEN - length of the current buffer (in bits) # BUF_NUM - the first buffer in the file is 0, the next 1, and so +on # BUF_POS - position inside the current buffer # my $self = {}; $self->{FILENAME} = $filename; $self->{FILEHANDLE} = $filehandle; $self->{CUR_BUF} = $bits_buf; $self->{CUR_BUF_LEN} = length($bits_buf); $self->{BUF_NUM} = 0; $self->{BUF_POS} = 0; print "$myname ($filename) created\n"; bless($self, $myname); } # Gets a specified amount of bits. Default is 1 # If the request is for more bits than left in the stream, returns the + ones # left; an empty string is returned when the stream ends # sub get_bits { my $self = shift; my $howmany = (defined $_[0]) ? $_[0] : 1; my $ret_str = ""; ($howmany <= (BUF_SIZE * 8)) or die "Please read in chunks no longer than ", BUF_SIZE * 8, +" bits\n"; my $n_bits_left_in_buf = $self->{CUR_BUF_LEN} - $self->{BUF_POS}; # the request is over the buffer's end ? if ($n_bits_left_in_buf < $howmany) { #~ print "$self->{CUR_BUF} $self->{BUF_POS} $n_bits_left_in_bu +f\n"; # take what's left in the buffer $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $n_bits_ +left_in_buf); my $howmany_left = $howmany - $n_bits_left_in_buf; # read the next buffer my $bytes_buf; my $bytes_read = read($self->{FILEHANDLE}, $bytes_buf, BUF_SIZ +E); # was the current buffer the last in the file ? if (($self->{CUR_BUF_LEN} < BUF_SIZE * 8) or ($bytes_read == 0)) { # then we just read the last bits of the file. returning a + string shorter # than $howmany signals to the caller that the stream ende +d # return $ret_str; } # update buffer info $self->{BUF_NUM} += 1; $self->{CUR_BUF} = unpack("B*", $bytes_buf); $self->{CUR_BUF_LEN} = $bytes_read * 8; #~ print "> $self->{CUR_BUF} $self->{CUR_BUF_LEN} $howmany_lef +t\n"; # complete the read from the new buffer $ret_str .= substr($self->{CUR_BUF}, 0, $howmany_left); $self->{BUF_POS} = $howmany_left; } else # the request still fits the current buffer { $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $howmany +); $self->{BUF_POS} += $howmany; } return $ret_str; }
In reply to BitStream revisited by spurperl
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |