Re: How to process each byte in a binary file?
by particle (Vicar) on Aug 12, 2002 at 19:54 UTC
|
my $file = 'myverylargefile.bin';
{
local *INPUT;
local $/ = \1;
open INPUT, '<', $file or die $!;
while(<INPUT>)
{
## process here...
}
}
from pervar:
Setting $/ to a reference to an integer, scalar containing an integer, or scalar that's convertible to an integer will attempt to read records instead of lines, with the maximum record size being the referenced integer.
~Particle *accelerates*
| [reply] [d/l] [select] |
|
|
It looks like $/ doesn't have any effect on IO::Scalar. I see that if the input is already in a file, and really is a primitive file handle, that this saves the trouble of reading it in first. But I wonder if the overhead of one read at a time is still high, compared to reading a chunk at a time and processing the chunks using one of the other methods.
| [reply] |
Re: How to process each byte in a binary file?
by kschwab (Vicar) on Aug 12, 2002 at 20:58 UTC
|
How about IO::Scalar or IO::String ?
You could then seek() and tell() around
the string, or read() in 1 byte increments.
I'm not sure what's under the covers, but
both seem elegant from the outside.
| [reply] |
|
|
| [reply] [d/l] [select] |
Re: How to process each byte in a binary file?
by jmcnamara (Monsignor) on Aug 12, 2002 at 22:42 UTC
|
I'd guess that unpack is the fastest but if you are looking for alternatives to benchmark you could try this:
for (split //, $str, length $str) { ... }
Regardless of the method you choose it would probably be best to read and process the file in chunks. Playing around with the buffer size might lead to an optimization between the size of the read and size of the data to process:
#!/usr/bin/perl -w
use strict;
open FILE, 'reload.xls' or die "Error message here: $!";
binmode FILE; # as required
my $buffer = 4096;
my $str;
while (read FILE, $str, $buffer) {
for (split //, $str, $buffer) {
# Your code here
}
}
--
John.
| [reply] [d/l] [select] |
Benchmark Results
by John M. Dlugosz (Monsignor) on Aug 12, 2002 at 21:48 UTC
|
Thus far, unpack"C" is the fastest. vec is 9% faster on a small input, 2% on a larger input, so there may be setup overhead there?
substr is about 10% slower than vec.
The regex/g is 1/3 to 1/2 the speed of substr. And using IO::Scalar is ten times slower than that!
—John | [reply] |
Re: How to process each byte in a binary file?
by kschwab (Vicar) on Aug 12, 2002 at 22:32 UTC
|
#!/usr/bin/perl
use Benchmark;
my $string="X" x 102400;
timethese(100, {
'split' => sub {
for (split(//,$string)) {};
},
'unpack' => sub {
for (unpack("C*",$string)) {};
},
'regex' => sub {
while($string =~ /./sg) {}
},
'substr' => sub {
for(my $i=0;$i<length($string);$i++){
substr($string,$i,1);
}
},
});
Gives me:
$ perl foo
Benchmark: timing 100 iterations of regex, split, substr, unpack...
regex: 44 wallclock secs (43.13 usr + 0.00 sys = 43.13 CPU)
split: 49 wallclock secs (47.90 usr + 0.04 sys = 47.94 CPU)
substr: 58 wallclock secs (55.70 usr + 0.00 sys = 55.70 CPU)
unpack: 27 wallclock secs (26.48 usr + 0.00 sys = 26.48 CPU)
Update:Reposted results after correcting typo. | [reply] [d/l] |
|
|
I get similar results: split is between regex and substr. Makes me wonder, though, since split// is a "special case" that splits on every character, why it isn't simply as fast as unpack?
—John
| [reply] |
|
|
| [reply] |
Re: How to process each byte in a binary file?
by Anonymous Monk on Aug 12, 2002 at 20:23 UTC
|
vec EXPR,OFFSET,BITS
Treats the string in EXPR as a bit vector made up of elements of width BITS, and returns the value of the element specified by OFFSET as an unsigned integer. BITS therefore specifies the number of bits that are reserved for each element in the bit vector. This must be a power of two from 1 to 32 (or 64, if your platform supports that).
Maybe this is what you mean? | [reply] |