in reply to BitStream revisited

If speed REALLY matters you do C not perl. But assuming you want Perl......

First OO Perl is 30-40% slower (in my unrelated testing) than straight function calls so you don't want OO (nor might I add is it justified for the task given that you have no real object data to encapsulate ie it just provides syntactic sugar and no functional value). Second you spend memory to gain speed or in other words every disk read is much slower than a memory read ie tens of nanoseconds versus several milliseconds so you want to buffer as big as possible*. Last you do as little as possible, making as few tests as possible, open files once, use global or vars that scope so they don't get created/destroyed, minimize sub calls (on/off stack) etc. I would do something like:

my $file = 'c:/test.pl'; my $BLOCK_SIZE = 1024*1024; open my $fh, $file or die $!; END{close $fh} my ( $buffer, $buf ); read( $fh, $buf, $BLOCK_SIZE ); $buffer .= unpack "B*", $buf; sub get_bits { my ( $num_bits ) = @_; # faster than shift unless ( length($buffer) > $num_bits ) { read( $fh, $buf, $BLOCK_SIZE ); $buffer .= unpack "B*", $buf; die "No more bits left" if length($buffer) < $num_bits; } return substr $buffer, 0, $num_bits, ''; } for ( 1..1000 ) { print get_bits(16), $/; }

* You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access. Depends a lot on OS, disks, disk buffering, available memory etc. We make no extra sub calls at all (all that on and off the stack takes time) and just do the minimum. As always YMMV.

cheers

tachyon

Replies are listed 'Best First'.
Re: Re: BitStream revisited
by sgifford (Prior) on Dec 31, 2003 at 04:30 UTC
    You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access.

    There's actually only a marginal speedup for anything over the system block size, usually 4K or 8K. In fact, when the buffer size gets up to around a MB, things slow down a little bit.

    All of these tests are with a warm cache on an approximately 1GB file.

    #!/usr/bin/perl my $i = 0; while (sysread(STDIN,$buf,$ENV{BLOCKSIZE})) { $i++; } print "Called read $i times.\n";
    $ BLOCKSIZE=4096 time perl /tmp/t6 <root_fs
    Called read 262144 times.
    0.55user 8.93system 0:38.36elapsed 24%CPU
    
    $ BLOCKSIZE=8192 time perl /tmp/t6 <root_fs
    Called read 131072 times.
    0.47user 8.53system 0:39.10elapsed 23%CPU
    
    $ BLOCKSIZE=16384 time perl /tmp/t6 <root_fs
    Called read 65536 times.
    0.24user 7.46system 0:38.04elapsed 20%CPU
    
    $ BLOCKSIZE=65536 time perl /tmp/t6 <root_fs
    Called read 16384 times.
    0.17user 9.04system 0:38.16elapsed 24%CPU
    
    $ BLOCKSIZE=262144 time perl /tmp/t6 <root_fs
    Called read 4096 times.
    0.13user 11.77system 0:38.53elapsed 30%CPU 
    
    $ BLOCKSIZE=524288 time perl /tmp/t6 <root_fs
    Called read 2048 times.
    0.06user 12.49system 0:39.15elapsed 32%CPU 
    
    $ BLOCKSIZE=1048576 time perl /tmp/t6 <root_fs
    Called read 1024 times.
    0.04user 12.94system 0:38.34elapsed 33%CPU
    

      Re: Re: Re: Re: BitStream revisited. As noted you need to test the exact code/OS/hardware combo. You will see I got somewhat different results with a 65K sweetspot for throughput speed (but with a defined 4096 read buffer as seen in other node). On my test system a 65k buffer would give me a 40% odd performance boost (over a 4K buffer) whereas on yours it would cost me 20% over your optimum 16k buffer. Just goes to show you can't overgeneralize tuning results.

      Out of interest the last lot of testing I did on this sort of thing was using IDE disks whereas this is on RAID V SCSI hardware.

      [root@devel3 root]# cat reader.pl #!/usr/bin/perl open $fh, '/root/big.file' or die $!; 1 while read( $fh, $buf, $ENV{BLOCK_SIZE} ); close $fh; [root@devel3 root]# ll big.file -rw-r--r-- 1 root root 100000000 Dec 31 04:59 big.file [root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl 0.11user 0.08system 0:00.20elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+31minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl 0.05user 0.12system 0:00.16elapsed 104%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4192 time perl /root/reader.pl 0.03user 0.10system 0:00.13elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+32minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl 0.05user 0.06system 0:00.11elapsed 94%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl 0.09user 0.11system 0:00.19elapsed 103%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl 0.03user 0.13system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+34minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl 0.01user 0.11system 0:00.11elapsed 102%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=8192 time perl /root/reader.pl 0.02user 0.08system 0:00.09elapsed 107%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+33minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=16384 time perl /root/reader.pl 0.01user 0.07system 0:00.08elapsed 98%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+34minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=65536 time perl /root/reader.pl 0.00user 0.07system 0:00.07elapsed 93%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+45minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=262144 time perl /root/reader.pl 0.02user 0.12system 0:00.13elapsed 100%CPU (0avgtext+0avgdata 0maxresi +dent)k 0inputs+0outputs (275major+95minor)pagefaults 0swaps [root@devel3 root]# BLOCK_SIZE=524288 time perl /root/reader.pl 0.01user 0.18system 0:00.25elapsed 74%CPU (0avgtext+0avgdata 0maxresid +ent)k 0inputs+0outputs (275major+159minor)pagefaults 0swaps [root@devel3 root]#

      cheers

      tachyon

Re: Re: BitStream revisited
by spurperl (Priest) on Jan 01, 2004 at 08:00 UTC
    Tachyon,

    Performance aside for a moment, why don't you think OO is appropriate here ? I happen to think it is, and my proof is that using this BitStream (I use it extensively in a large application) is a pleasure.

    As I see it, the BitStream has an internal state - position of the file handle, a current buffer (that is a *must*, as I said, as keeping the whole thing in memory is unpractical, and read()-ing on each access is slow), etc. I simply call $in->get_bits(number) and get my bits. If I want, I call $in->seek_pos(jump, whence) to go where I want, etc... I find that encapsulating this with an object is very convenient.

      There is nothing wrong with OO. I like it and use it. Your initial question concerned speed ie can this be made faster. Answer yes ditch the OO. OO perl is 30-40% slower than using raw function calls - just to do the same call. If you have other requirements that mandate OO then you have other requirements.....

      If you will only have a single instance of your bitstream object then you can easily just do:

      { # hold state stuff in a closure # this can be identical to an object hash data # but you only get one instance at any one time # with this simple setup my %state; sub get { } sub seek_pos { } # set your %state with init like new sub init { } }

      That will be 30-40% faster than OO with the same data encapsulation but limited to one instance. It all depends on the application. Like I said initially if speed really matters I would hack it up in C but that would take time. As always it is a tradeoff between hardware cost/head down coding time/oo convenince etc. A lot of problems can be solved quickly with Perl and made fast enough by throwing more memory and cycles at it to get the required performance. Applying more grunt to relatively inefficient code is a practical real world solution. Just ask M$ and intel.

      cheers

      tachyon