If most of your time is spent in the pack/unpack, that's a good thing! You're probably using them very effectively.
| [reply] |
Depending on whether you actually need to unpack each record or not, you can maybe hardcode the offsets to reject rows before actually unpacking them. index can quickly look at a position in a string without needing pack or unpack.
Depending on how you unpack things, it might be quicker to build one large unpack template instead of unpacking items in a loop.
| [reply] |
I've to unpack all fields of records.
Yes I'm already grouping one large template as much as possible. But there are optional fields in record so that had to be handled as well.
| [reply] |
Consider also writing a XS module for unpacking the data. It should be relatively easy if you know how to program in C.
| [reply] |
Show some code, there might be some things that can be optimised.
If you have/can install Inline::C, it can speed processing binary records much quicker. Especially if you move optional fields logic into C.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] |
Here is pseudocode.
The record fields can be either fixed or array.
Fixed data type: Character, Integeral types (Unsigned short, Unsigned long), Float
Variable data type: String(pascal style C/a), Array of Unsigned short, etc
Optional fields at end of record can be omitted.
So if there is a record with optional (Byte, C/a(String), Float), that needs to be handled somehow.
read_file_header
determine endian from file_header data
set up unpack data types(for big/little endian)
if(endian eq 'little') {
$REAL_TEMPLATE = "R<";
$U2 = "n";
$REC_HEAD = ...
} else {
$U2 = "v";
$REC_HEAD = ...
...etc
}
while(1) {
read (REC_HEAD) size data from file
($rec_len,$rec_type,....) = unpack($REC_HEAD)
my $rec_body = read($rec_len)
# a big switch on rec_type
if($rec_type == FOO) {
# unpack record_body somehow for THIS rec type
# FOO can be
# below 4 fields must be present in record body
# fixed uSHORT,uLONG,uLONG,string(C/a),
# below are optional
# optional Byte(Optional onwards this field), Float,String
my @data = unpack(" $uSHORT $uLONG $uLONG C/a",$rec_body);
my $consumed_length = 10 + length($data[-1]) + 1; # ushort +2*
+ulong + length(C/a)
if($consumed_length < $rec_len) {
# optional fields present
push @data, unpack("x${consumed_length} C",$rec_body);
$consumed_length += 1;
}
if($consumed_length < $rec_len) {
push @data,unpack("x${consumed_length} $Float",$data);
$consumed_length += 4; # float is 4 bytes
}
# next optional ..etc
}
elsif($rec_type == BAR) {
}
}
| [reply] [d/l] |
It would be helpful if you gave more specs about this file.The last time I worked with a binary file in Perl, it was to concatenate some .WAV files together. The .WAV has a header and then some number of binary bytes of data. The number of data bytes are specified in the header info. It was not necessary for me to unpack all of the data, just parts of the binary read in the header which were relevant to the size of the data that followed amoungst other params. I selected the key parts of the binary header via substr to get a ranges of bytes and used pack/unpack upon them.
I think BrowserUk is on the right track here Re: Optimizing binary file parser (pack/unpack). | [reply] |