Re: BitStream revisited
by tachyon (Chancellor) on Dec 30, 2003 at 11:42 UTC
|
If speed REALLY matters you do C not perl. But assuming you want Perl......
First OO Perl is 30-40% slower (in my unrelated testing) than straight function calls so you don't want OO (nor might I add is it justified for the task given that you have no real object data to encapsulate ie it just provides syntactic sugar and no functional value). Second you spend memory to gain speed or in other words every disk read is much slower than a memory read ie tens of nanoseconds versus several milliseconds so you want to buffer as big as possible*. Last you do as little as possible, making as few tests as possible, open files once, use global or vars that scope so they don't get created/destroyed, minimize sub calls (on/off stack) etc. I would do something like:
my $file = 'c:/test.pl';
my $BLOCK_SIZE = 1024*1024;
open my $fh, $file or die $!;
END{close $fh}
my ( $buffer, $buf );
read( $fh, $buf, $BLOCK_SIZE );
$buffer .= unpack "B*", $buf;
sub get_bits {
my ( $num_bits ) = @_; # faster than shift
unless ( length($buffer) > $num_bits ) {
read( $fh, $buf, $BLOCK_SIZE );
$buffer .= unpack "B*", $buf;
die "No more bits left" if length($buffer) < $num_bits;
}
return substr $buffer, 0, $num_bits, '';
}
for ( 1..1000 ) {
print get_bits(16), $/;
}
* You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access. Depends a lot on OS, disks, disk buffering, available memory etc. We make no extra sub calls at all (all that on and off the stack takes time) and just do the minimum. As always YMMV.
| [reply] [d/l] |
|
|
You would play with the BLOCK_SIZE (probably 1-2MB will be optimal as a stab in the dark - see Re: Performance Question for details) to spend as much memory as you can/need to/is optimal and limit disk access.
There's actually only a marginal speedup for anything over the system block size, usually 4K or 8K. In fact, when the buffer size gets up to around a MB, things slow down a little bit.
All of these tests are with a warm cache on an approximately 1GB file.
#!/usr/bin/perl
my $i = 0;
while (sysread(STDIN,$buf,$ENV{BLOCKSIZE}))
{
$i++;
}
print "Called read $i times.\n";
$ BLOCKSIZE=4096 time perl /tmp/t6 <root_fs
Called read 262144 times.
0.55user 8.93system 0:38.36elapsed 24%CPU
$ BLOCKSIZE=8192 time perl /tmp/t6 <root_fs
Called read 131072 times.
0.47user 8.53system 0:39.10elapsed 23%CPU
$ BLOCKSIZE=16384 time perl /tmp/t6 <root_fs
Called read 65536 times.
0.24user 7.46system 0:38.04elapsed 20%CPU
$ BLOCKSIZE=65536 time perl /tmp/t6 <root_fs
Called read 16384 times.
0.17user 9.04system 0:38.16elapsed 24%CPU
$ BLOCKSIZE=262144 time perl /tmp/t6 <root_fs
Called read 4096 times.
0.13user 11.77system 0:38.53elapsed 30%CPU
$ BLOCKSIZE=524288 time perl /tmp/t6 <root_fs
Called read 2048 times.
0.06user 12.49system 0:39.15elapsed 32%CPU
$ BLOCKSIZE=1048576 time perl /tmp/t6 <root_fs
Called read 1024 times.
0.04user 12.94system 0:38.34elapsed 33%CPU
| [reply] [d/l] |
|
|
Re: Re: Re: Re: BitStream revisited. As noted you need to test the exact code/OS/hardware combo. You will see I got somewhat different results with a 65K sweetspot for throughput speed (but with a defined 4096 read buffer as seen in other node). On my test system a 65k buffer would give me a 40% odd performance boost (over a 4K buffer) whereas on yours it would cost me 20% over your optimum 16k buffer. Just goes to show you can't overgeneralize tuning results.
Out of interest the last lot of testing I did on this sort of thing was using IDE disks whereas this is on RAID V SCSI hardware.
[root@devel3 root]# cat reader.pl
#!/usr/bin/perl
open $fh, '/root/big.file' or die $!;
1 while read( $fh, $buf, $ENV{BLOCK_SIZE} );
close $fh;
[root@devel3 root]# ll big.file
-rw-r--r-- 1 root root 100000000 Dec 31 04:59 big.file
[root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl
0.11user 0.08system 0:00.20elapsed 94%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+31minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl
0.05user 0.12system 0:00.16elapsed 104%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+33minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=4192 time perl /root/reader.pl
0.03user 0.10system 0:00.13elapsed 94%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+32minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl
0.05user 0.06system 0:00.11elapsed 94%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+33minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=1024 time perl /root/reader.pl
0.09user 0.11system 0:00.19elapsed 103%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+33minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=2048 time perl /root/reader.pl
0.03user 0.13system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+34minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=4096 time perl /root/reader.pl
0.01user 0.11system 0:00.11elapsed 102%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+33minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=8192 time perl /root/reader.pl
0.02user 0.08system 0:00.09elapsed 107%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+33minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=16384 time perl /root/reader.pl
0.01user 0.07system 0:00.08elapsed 98%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+34minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=65536 time perl /root/reader.pl
0.00user 0.07system 0:00.07elapsed 93%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+45minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=262144 time perl /root/reader.pl
0.02user 0.12system 0:00.13elapsed 100%CPU (0avgtext+0avgdata 0maxresi
+dent)k
0inputs+0outputs (275major+95minor)pagefaults 0swaps
[root@devel3 root]# BLOCK_SIZE=524288 time perl /root/reader.pl
0.01user 0.18system 0:00.25elapsed 74%CPU (0avgtext+0avgdata 0maxresid
+ent)k
0inputs+0outputs (275major+159minor)pagefaults 0swaps
[root@devel3 root]#
| [reply] [d/l] |
|
|
Tachyon,
Performance aside for a moment, why don't you think OO is appropriate here ? I happen to think it is, and my proof is that using this BitStream (I use it extensively in a large application) is a pleasure.
As I see it, the BitStream has an internal state - position of the file handle, a current buffer (that is a *must*, as I said, as keeping the whole thing in memory is unpractical, and read()-ing on each access is slow), etc. I simply call $in->get_bits(number) and get my bits. If I want, I call $in->seek_pos(jump, whence) to go where I want, etc... I find that encapsulating this with an object is very convenient.
| [reply] |
|
|
There is nothing wrong with OO. I like it and use it. Your initial question concerned speed ie can this be made faster. Answer yes ditch the OO. OO perl is 30-40% slower than using raw function calls - just to do the same call. If you have other requirements that mandate OO then you have other requirements.....
If you will only have a single instance of your bitstream object then you can easily just do:
{
# hold state stuff in a closure
# this can be identical to an object hash data
# but you only get one instance at any one time
# with this simple setup
my %state;
sub get {
}
sub seek_pos {
}
# set your %state with init like new
sub init {
}
}
That will be 30-40% faster than OO with the same data encapsulation but limited to one instance. It all depends on the application. Like I said initially if speed really matters I would hack it up in C but that would take time. As always it is a tradeoff between hardware cost/head down coding time/oo convenince etc. A lot of problems can be solved quickly with Perl and made fast enough by throwing more memory and cycles at it to get the required performance. Applying more grunt to relatively inefficient code is a practical real world solution. Just ask M$ and intel.
| [reply] [d/l] |
Re: BitStream revisited
by !1 (Hermit) on Dec 30, 2003 at 11:22 UTC
|
package BitStream;
use constant BUF_SIZE => 2048;
use POSIX qw(ceil);
sub new {
die "Correct usage is BitStream->new(filename)" unless @_ == 2;
my ($class,$filename) = @_;
open my $fh,"<",$filename or die "Couldn't open $filename: $!";
read($fh,my $buffer,BUF_SIZE);
my $self = { fh => $fh, buffer => $buffer, dead_start => 0 };
bless $self,$class;
}
sub get_bits {
my ($self,$bits) = @_;
no warnings "once";
return "" unless $bits+=0;
# ok, make the buffer big enough to handle the request
my $buf;
$self->{'buffer'} .= $buf while length($self->{'buffer'}) * 8 + $sel
+f->{'dead_start'} < $bits && read($self->{'fh'},$buf,BUF_SIZE);
# figure out our end position within the buffer
my $bytes = ceil(($bits + $self->{'dead_start'})/8);
my $return_buf = unpack("B*",substr($self->{'buffer'},0,$bytes));
$return_buf = substr($return_buf,$self->{'dead_start'},$bits) unless
+ $return_buf eq "";
$self->{'dead_start'} = ($bits + $self->{'dead_start'}) % 8;
$bytes-- if $self->{'dead_start'};
substr($self->{'buffer'},0,$bytes) = "";
return $return_buf;
}
"Give me liberty or at least the remote";
Of course, I must admit that I prefer bit twiddling over this. | [reply] [d/l] |
Re: BitStream revisited
by BrowserUk (Patriarch) on Dec 30, 2003 at 15:27 UTC
|
If performance is paramount in your application, unpacking every bit to a byte, thereby compounding the memory requirements x8, and then processing the 'bits' one at a time is so not the way to approach the problem.
Adding the overhead, both performance and memory, of a perlish OO interface is just compounding this further.
Perl's vec builtin allows read/write access to the bits, individually and in groups, of bitstrings of arbitrary (given you have the memory) length. It effectively allows you to treat the string as a huge array of bits, getting and setting them, by index, in-situ.
By comparison with your mechanism, it is both fast and efficient, and requires no extra memory beyond the bitstring itself. The only time vec becomes a little awkward to use is if you need to access multiples of bits that are not powers of two and/or cross byte boundaries. But there are very few file formats that do not align the bits on (at least) byte boundaries, so that is very rarely a problem.
You don't describe the actual application, but I would highly recommend that you take another look at vec and understand what does and how it works. You will probably find that you can re-cast your problem to use it and greatly improve the efficency by doing so.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
| [reply] |
|
|
| [reply] |
Re: BitStream revisited
by ysth (Canon) on Dec 30, 2003 at 19:38 UTC
|
As others have suggested, doing this with OO is going to have a cost. One way of avoiding that yet preserving the modularization and ability to have multiple bitstreams would be to generate a closure. Pseudo-code:
sub BitStream::make_bitstream {
my $filename = shift;
open my $fh, ... or return
my $buf='';
return sub {
# if a buffer is available, use it
my $ret_str = substr($buf, 0, (defined($_[0]) ? $_[0] : 1), '');
# fill buffer if request may not have been satisfied
if (!length($buf)) {
my $need = (defined($_[0]) ? $_[0] : 1) - length($ret_Str);
return $ret_str if !$need;
read from $fh and unpack into $buf
$ret_str .= substr($buf, 0, $need, '');
}
$ret_str;
}
}
#example
package main;
$get_bits = BitStream::make_bitstring("foo.dat") or die "open failure:
+ $!";
$onebit = $get_bits->();
$twobits = $get_bits->(2);
It's not clear whether you want failure to read the desired number of bits would be a fatal error or just return what was available.
!defined($_[0])||$_[0] may be faster than the ?: way; if you are that concerned, you may want to benchmark it (or just use C :).
Update: - $ret_str was supposed to be - length($ret_str)
| [reply] [d/l] [select] |
|
|
This is a very pretty solution, thanks ! I'm not sure by how much it will improve performance, but a closure fits here well. It's nice that Perl has this ability (borrowed from Lisp, of course) as in some case it may solve a problem graciously, but so can an object. That is said by me, who adores Lisp and FP in general and looks at OO with suspicion, as a nice solution in *some* cases, but surely not the uber-panacea hyped about so much.
| [reply] |
Re: BitStream revisited
by iburrell (Chaplain) on Dec 30, 2003 at 21:00 UTC
|
I would suggest that you let Perl and stdio take care of buffering reads. It is already well-tested and fast. You should consider using vec to manipulate the bitstrings especially if you are reading large amounts.
The BitStream module would need to figure out how many bytes to read with read(). If the number of bits read is not a integral number of bytes, then it needs to save the leftover bits. The leftover bits would be used for the next read. | [reply] |
|
|
| [reply] |
|
|
read already uses buffered input, and each call to it does not make a system call.
#!/usr/bin/perl
while (read(STDIN,$buf,$ENV{BLOCKSIZE}))
{
;
}
produces:
$ BLOCKSIZE=1 strace -e read -c perl /tmp/t6 <zelda4rom.zip
Called read 26968 times.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.002888 160 18 read
------ ----------- ----------- --------- --------- ----------------
100.00 0.002888 18 total
$ BLOCKSIZE=1024 strace -e read -c perl /tmp/t6 <zelda4rom.zip
Called read 27 times.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000370 19 19 read
------ ----------- ----------- --------- --------- ----------------
100.00 0.000370 19 total
$ BLOCKSIZE=4096 strace -e read -c perl /tmp/t6 <zelda4rom.zip
Called read 7 times.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000315 17 19 read
------ ----------- ----------- --------- --------- ----------------
100.00 0.000315 19 total
| [reply] [d/l] [select] |
|
|