BitStream revisited

spurperl has asked for the wisdom of the Perl Monks concerning the following question:

My musings on bit streams were thoroughly covered here and here, but just to brink it back on track:

I need to see a file as a bit stream. It should be opened, and asked for bits - get_bits(howmany), not in chunks of bytes or words, but BITS. I.e. I may ask it for 1 bit, for 17 bits, etc.

One solution (the simplest and fastest) to represent such a stream is by a string of 1s and 0s, that is read from the file once and unpack()-ed. But there's a problem with this approach: files may get huge (GBs), and memory usage is a problem. Holding such strings in memory is impossible.

The other solution is slower, but unlimited in memory. Keep a buffer of some length (preferably long), and when a request gets beyond the current buffer, fetch the next one and adjust.

Today I hit the memory problem hard to I implemented the second solution. I'd like to kindly ask my fellow monks for advice and guidance - can this be made faster ? I need the fastest get_bits() function possible. Here is the constructor and the get_bits function of the BitStream object:

# buffer size, in bytes
use constant BUF_SIZE => 2048;


# Constructed with a filename 
#
sub new
{
    my $filename = $_[0];
    
    open(FH, "$filename") or die "$myname error: unable to open $filen
+ame: $!\n";
    binmode(FH);
    my $filehandle = *FH;

    my $bytes_buf;
    my $bytes_read = read(*FH, $bytes_buf, BUF_SIZE);    
    my $bits_buf = unpack("B*", $bytes_buf);
    
    # Members:
    #
    # FILENAME
    # FILEHANDLE
    # CUR_BUF - the current buffer held in memory (a bitstring)
    # CUR_BUF_LEN - length of the current buffer (in bits)
    # BUF_NUM - the first buffer in the file is 0, the next 1, and so 
+on
    # BUF_POS - position inside the current buffer
    #
    my $self = {};
    $self->{FILENAME}        = $filename;
    $self->{FILEHANDLE}     = $filehandle;
    $self->{CUR_BUF}         = $bits_buf;
    $self->{CUR_BUF_LEN}    = length($bits_buf);
    $self->{BUF_NUM}        = 0;
    $self->{BUF_POS}        = 0;
    
    print "$myname ($filename) created\n";
    
    bless($self, $myname);
}

# Gets a specified amount of bits. Default is 1
# If the request is for more bits than left in the stream, returns the
+ ones
# left; an empty string is returned when the stream ends
#
sub get_bits
{
    my $self = shift;
    my $howmany = (defined $_[0]) ? $_[0] : 1;
    my $ret_str = "";
    
    ($howmany <= (BUF_SIZE * 8))
        or die "Please read in chunks no longer than ", BUF_SIZE * 8, 
+" bits\n";
    
    my $n_bits_left_in_buf = $self->{CUR_BUF_LEN} - $self->{BUF_POS};
        
    # the request is over the buffer's end ?
    if ($n_bits_left_in_buf < $howmany)
    {    
        #~ print "$self->{CUR_BUF} $self->{BUF_POS} $n_bits_left_in_bu
+f\n";
        # take what's left in the buffer
        $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $n_bits_
+left_in_buf);
        my $howmany_left = $howmany - $n_bits_left_in_buf;

        # read the next buffer
        my $bytes_buf;
        my $bytes_read = read($self->{FILEHANDLE}, $bytes_buf, BUF_SIZ
+E);
        
        # was the current buffer the last in the file ?
        if (($self->{CUR_BUF_LEN} < BUF_SIZE * 8) or 
            ($bytes_read == 0))
        {
            # then we just read the last bits of the file. returning a
+ string shorter
            # than $howmany signals to the caller that the stream ende
+d
            #
            return $ret_str;
        }
        
        # update buffer info
        $self->{BUF_NUM} += 1;
        $self->{CUR_BUF} = unpack("B*", $bytes_buf);
        $self->{CUR_BUF_LEN} = $bytes_read * 8;
        
        #~ print "> $self->{CUR_BUF} $self->{CUR_BUF_LEN} $howmany_lef
+t\n";
        
        # complete the read from the new buffer
        $ret_str .= substr($self->{CUR_BUF}, 0, $howmany_left);
        $self->{BUF_POS} = $howmany_left;
    }
    else
    # the request still fits the current buffer
    {
        $ret_str = substr($self->{CUR_BUF}, $self->{BUF_POS}, $howmany
+);
        $self->{BUF_POS} += $howmany;
    }
    
    return $ret_str;
}
[download]

Notes:

I need the common case fast, naturally. Buffer refill happens once in a long time - the average chunk read is 128 bits, and the buffer is 16K bits long (can be longer...), so once in 100+ times it's refilled. The reading when no refill is required must be as fast as possible (AFAP :).
Ignore the test ($howmany <= (BUF_SIZE * 8)). I don't want to handle this special case, but I'll probably forbid it by definition, and won't waste cycles in each call of get_bits() to handle it

Comment on BitStream revisited Download Code

Replies are listed 'Best First'.
Re: BitStream revisited by tachyon (Chancellor) on Dec 30, 2003 at 11:42 UTC
If speed REALLY matters you do C not perl. But assuming you want Perl...... First OO Perl is 30-40% slower (in my unrelated testing) than straight function calls so you don't want OO (nor might I add is it justified for the task given that you have no real object data to encapsulate ie it just provides syntactic sugar and no functional value). Second you spend memory to gain speed or in other words every disk read is much slower than a memory read ie tens of nanoseconds versus several milliseconds so you want to buffer as big as possible. Last you do as little as possible, making as few tests as possible, open files once, use global or vars that scope so they don't get created/destroyed, minimize sub calls (on/off stack) etc. I would do something like: `my $file = 'c:/test.pl'; my $BLOCK_SIZE = 10241024; open my $fh, $file or die $!; END{close $fh} my ( $buffer, $buf ); read( $fh, $buf, $BLOCK_SIZE ); $buffer .= unpack "B", $buf; sub get_bits { my ( $num_bits ) = @_; # faster than shift unless ( length($buffer) > $num_bits ) { read( $fh, $buf, $BLOCK_SIZE ); $buffer .= unpack "B", $buf; die "No more bits left" if length($buffer) < $num_bits; } return substr $buffer, 0, $num_bits, ''; } for ( 1..1000 ) { print get_bits(16), $/; }` [download] * You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access. Depends a lot on OS, disks, disk buffering, available memory etc. We make no extra sub calls at all (all that on and off the stack takes time) and just do the minimum. As always YMMV. cheers tachyon	[reply] [d/l]
Re: Re: BitStream revisited by sgifford (Prior) on Dec 31, 2003 at 04:30 UTC
You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access. There's actually only a marginal speedup for anything over the system block size, usually 4K or 8K. In fact, when the buffer size gets up to around a MB, things slow down a little bit. All of these tests are with a warm cache on an approximately 1GB file. `#!/usr/bin/perl my $i = 0; while (sysread(STDIN,$buf,$ENV{BLOCKSIZE})) { $i++; } print "Called read $i times.\n";` [download] $ BLOCKSIZE=4096 time perl /tmp/t6 <root_fs Called read 262144 times. 0.55user 8.93system 0:38.36elapsed 24%CPU $ BLOCKSIZE=8192 time perl /tmp/t6 <root_fs Called read 131072 times. 0.47user 8.53system 0:39.10elapsed 23%CPU $ BLOCKSIZE=16384 time perl /tmp/t6 <root_fs Called read 65536 times. 0.24user 7.46system 0:38.04elapsed 20%CPU $ BLOCKSIZE=65536 time perl /tmp/t6 <root_fs Called read 16384 times. 0.17user 9.04system 0:38.16elapsed 24%CPU $ BLOCKSIZE=262144 time perl /tmp/t6 <root_fs Called read 4096 times. 0.13user 11.77system 0:38.53elapsed 30%CPU $ BLOCKSIZE=524288 time perl /tmp/t6 <root_fs Called read 2048 times. 0.06user 12.49system 0:39.15elapsed 32%CPU $ BLOCKSIZE=1048576 time perl /tmp/t6 <root_fs Called read 1024 times. 0.04user 12.94system 0:38.34elapsed 33%CPU	[reply] [d/l]
Re: Re: Re: BitStream revisited by tachyon (Chancellor) on Dec 31, 2003 at 05:11 UTC
Re: Re: Re: Re: BitStream revisited. As noted you need to test the exact code/OS/hardware combo. You will see I got somewhat different results with a 65K sweetspot for throughput speed (but with a defined 4096 read buffer as seen in other node). On my test system a 65k buffer would give me a 40% odd performance boost (over a 4K buffer) whereas on yours it would cost me 20% over your optimum 16k buffer. Just goes to show you can't overgeneralize tuning results. Out of interest the last lot of testing I did on this sort of thing was using IDE disks whereas this is on RAID V SCSI hardware. [root@devel3 root]# cat reader.pl #!/usr/bin/perl open $fh, '/root/big.file' or die $!; 1 while read( $fh, $buf, $ENV{BLOCK_SIZE} ); close $fh; [root@devel3 root]# ll big.file -rw-r--r-- 1 root root 100000000 Dec 31 04:59 big.file [root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl 0.11user 0.08system 0:00.20elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+31minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl 0.05user 0.12system 0:00.16elapsed 104%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4192 time perl /root/reader.pl 0.03user 0.10system 0:00.13elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+32minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl 0.05user 0.06system 0:00.11elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl 0.09user 0.11system 0:00.19elapsed 103%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl 0.03user 0.13system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+34minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl 0.01user 0.11system 0:00.11elapsed 102%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=8192 time perl /root/reader.pl 0.02user 0.08system 0:00.09elapsed 107%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=16384 time perl /root/reader.pl 0.01user 0.07system 0:00.08elapsed 98%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+34minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=65536 time perl /root/reader.pl 0.00user 0.07system 0:00.07elapsed 93%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+45minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=262144 time perl /root/reader.pl 0.02user 0.12system 0:00.13elapsed 100%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+95minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=524288 time perl /root/reader.pl 0.01user 0.18system 0:00.25elapsed 74%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+159minor)pagefaults 0swaps [root@devel3 root]# [download] cheers tachyon	[reply] [d/l]
Re: Re: BitStream revisited by spurperl (Priest) on Jan 01, 2004 at 08:00 UTC
Tachyon, Performance aside for a moment, why don't you think OO is appropriate here ? I happen to think it is, and my proof is that using this BitStream (I use it extensively in a large application) is a pleasure. As I see it, the BitStream has an internal state - position of the file handle, a current buffer (that is a must, as I said, as keeping the whole thing in memory is unpractical, and read()-ing on each access is slow), etc. I simply call $in->get_bits(number) and get my bits. If I want, I call $in->seek_pos(jump, whence) to go where I want, etc... I find that encapsulating this with an object is very convenient.	[reply]
Re: Re: Re: BitStream revisited by tachyon (Chancellor) on Jan 01, 2004 at 11:02 UTC
There is nothing wrong with OO. I like it and use it. Your initial question concerned speed ie can this be made faster. Answer yes ditch the OO. OO perl is 30-40% slower than using raw function calls - just to do the same call. If you have other requirements that mandate OO then you have other requirements..... If you will only have a single instance of your bitstream object then you can easily just do: `{ # hold state stuff in a closure # this can be identical to an object hash data # but you only get one instance at any one time # with this simple setup my %state; sub get { } sub seek_pos { } # set your %state with init like new sub init { } }` [download] That will be 30-40% faster than OO with the same data encapsulation but limited to one instance. It all depends on the application. Like I said initially if speed really matters I would hack it up in C but that would take time. As always it is a tradeoff between hardware cost/head down coding time/oo convenince etc. A lot of problems can be solved quickly with Perl and made fast enough by throwing more memory and cycles at it to get the required performance. Applying more grunt to relatively inefficient code is a practical real world solution. Just ask M$ and intel. cheers tachyon	[reply] [d/l]
Re: BitStream revisited by !1 (Hermit) on Dec 30, 2003 at 11:22 UTC
How about: package BitStream; use constant BUF_SIZE => 2048; use POSIX qw(ceil); sub new { die "Correct usage is BitStream->new(filename)" unless @_ == 2; my ($class,$filename) = @_; open my $fh,"<",$filename or die "Couldn't open $filename: $!"; read($fh,my $buffer,BUF_SIZE); my $self = { fh => $fh, buffer => $buffer, dead_start => 0 }; bless $self,$class; } sub get_bits { my ($self,$bits) = @_; no warnings "once"; return "" unless $bits+=0; # ok, make the buffer big enough to handle the request my $buf; $self->{'buffer'} .= $buf while length($self->{'buffer'}) * 8 + $sel +f->{'dead_start'} < $bits && read($self->{'fh'},$buf,BUF_SIZE); # figure out our end position within the buffer my $bytes = ceil(($bits + $self->{'dead_start'})/8); my $return_buf = unpack("B*",substr($self->{'buffer'},0,$bytes)); $return_buf = substr($return_buf,$self->{'dead_start'},$bits) unless + $return_buf eq ""; $self->{'dead_start'} = ($bits + $self->{'dead_start'}) % 8; $bytes-- if $self->{'dead_start'}; substr($self->{'buffer'},0,$bytes) = ""; return $return_buf; } "Give me liberty or at least the remote"; [download] Of course, I must admit that I prefer bit twiddling over this.	[reply] [d/l]
Re: BitStream revisited by BrowserUk (Patriarch) on Dec 30, 2003 at 15:27 UTC
If performance is paramount in your application, unpacking every bit to a byte, thereby compounding the memory requirements x8, and then processing the 'bits' one at a time is so not the way to approach the problem. Adding the overhead, both performance and memory, of a perlish OO interface is just compounding this further. Perl's vec builtin allows read/write access to the bits, individually and in groups, of bitstrings of arbitrary (given you have the memory) length. It effectively allows you to treat the string as a huge array of bits, getting and setting them, by index, in-situ. By comparison with your mechanism, it is both fast and efficient, and requires no extra memory beyond the bitstring itself. The only time vec becomes a little awkward to use is if you need to access multiples of bits that are not powers of two and/or cross byte boundaries. But there are very few file formats that do not align the bits on (at least) byte boundaries, so that is very rarely a problem. You don't describe the actual application, but I would highly recommend that you take another look at vec and understand what does and how it works. You will probably find that you can re-cast your problem to use it and greatly improve the efficency by doing so. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray!	[reply]
Re: Re: BitStream revisited by ysth (Canon) on Dec 30, 2003 at 19:20 UTC
It sounds like some kind of compression algoritm to me, so I wouldn't be surprised if the lengths were arbitrary. If that is the case, I think avoiding vec is preferable. Update: but I'd love to see someone come up with example code using vec that handled non-power-of-2 lengths.	[reply]
Re: BitStream revisited by ysth (Canon) on Dec 30, 2003 at 19:38 UTC
As others have suggested, doing this with OO is going to have a cost. One way of avoiding that yet preserving the modularization and ability to have multiple bitstreams would be to generate a closure. Pseudo-code: sub BitStream::make_bitstream { my $filename = shift; open my $fh, ... or return my $buf=''; return sub { # if a buffer is available, use it my $ret_str = substr($buf, 0, (defined($_[0]) ? $_[0] : 1), ''); # fill buffer if request may not have been satisfied if (!length($buf)) { my $need = (defined($_[0]) ? $_[0] : 1) - length($ret_Str); return $ret_str if !$need; read from $fh and unpack into $buf $ret_str .= substr($buf, 0, $need, ''); } $ret_str; } } #example package main; $get_bits = BitStream::make_bitstring("foo.dat") or die "open failure: + $!"; $onebit = $get_bits->(); $twobits = $get_bits->(2); [download] It's not clear whether you want failure to read the desired number of bits would be a fatal error or just return what was available. `!defined($_[0])\|\|$_[0]` may be faster than the ?: way; if you are that concerned, you may want to benchmark it (or just use C :). Update: `- $ret_str` was supposed to be `- length($ret_str)`	[reply] [d/l] [select]
Re: Re: BitStream revisited by spurperl (Priest) on Jan 01, 2004 at 08:04 UTC
This is a very pretty solution, thanks ! I'm not sure by how much it will improve performance, but a closure fits here well. It's nice that Perl has this ability (borrowed from Lisp, of course) as in some case it may solve a problem graciously, but so can an object. That is said by me, who adores Lisp and FP in general and looks at OO with suspicion, as a nice solution in some cases, but surely not the uber-panacea hyped about so much.	[reply]
Re: BitStream revisited by iburrell (Chaplain) on Dec 30, 2003 at 21:00 UTC
I would suggest that you let Perl and stdio take care of buffering reads. It is already well-tested and fast. You should consider using vec to manipulate the bitstrings especially if you are reading large amounts. The BitStream module would need to figure out how many bytes to read with read(). If the number of bits read is not a integral number of bytes, then it needs to save the leftover bits. The leftover bits would be used for the next read.	[reply]
Re: Re: BitStream revisited by tachyon (Chancellor) on Dec 31, 2003 at 02:46 UTC
Please read Re: Performance Question and the associated thread. Optimizing (minimizing) the number of disk accesses via read is paramount to speed. There are simple reasons for this ie nanosecond vs milisecond retreival times for memory cf disk and it is trivial to test and prove. cheers tachyon	[reply]
Re: Re: Re: BitStream revisited by sgifford (Prior) on Dec 31, 2003 at 04:08 UTC
`read` already uses buffered input, and each call to it does not make a system call. `#!/usr/bin/perl while (read(STDIN,$buf,$ENV{BLOCKSIZE})) { ; }` [download] produces: $ BLOCKSIZE=1 strace -e read -c perl /tmp/t6 <zelda4rom.zip Called read 26968 times. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002888 160 18 read ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002888 18 total $ BLOCKSIZE=1024 strace -e read -c perl /tmp/t6 <zelda4rom.zip Called read 27 times. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000370 19 19 read ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000370 19 total $ BLOCKSIZE=4096 strace -e read -c perl /tmp/t6 <zelda4rom.zip Called read 7 times. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000315 17 19 read ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000315 19 total	[reply] [d/l] [select]
Re: Re: Re: Re: BitStream revisited by tachyon (Chancellor) on Dec 31, 2003 at 04:53 UTC