Optimizing binary file parser (pack/unpack)

pwagyi has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimizing binary file parser (pack/unpack) by ikegami (Patriarch) on Oct 03, 2017 at 08:00 UTC
If most of your time is spent in the pack/unpack, that's a good thing! You're probably using them very effectively.	[reply]
Re: Optimizing binary file parser (pack/unpack) by Corion (Patriarch) on Oct 03, 2017 at 08:12 UTC
Depending on whether you actually need to unpack each record or not, you can maybe hardcode the offsets to reject rows before actually unpacking them. index can quickly look at a position in a string without needing pack or unpack. Depending on how you unpack things, it might be quicker to build one large unpack template instead of unpacking items in a loop.	[reply]
Re^2: Optimizing binary file parser (pack/unpack) by pwagyi (Monk) on Oct 03, 2017 at 08:16 UTC
I've to unpack all fields of records. Yes I'm already grouping one large template as much as possible. But there are optional fields in record so that had to be handled as well.	[reply]
Re: Optimizing binary file parser (pack/unpack) by salva (Canon) on Oct 03, 2017 at 08:51 UTC
Consider also writing a XS module for unpacking the data. It should be relatively easy if you know how to program in C.	[reply]
Re: Optimizing binary file parser (pack/unpack) by BrowserUk (Patriarch) on Oct 03, 2017 at 13:26 UTC
Show some code, there might be some things that can be optimised. If you have/can install Inline::C, it can speed processing binary records much quicker. Especially if you move optional fields logic into C. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit	[reply]
Re^2: Optimizing binary file parser (pack/unpack) by pwagyi (Monk) on Oct 04, 2017 at 01:54 UTC
Here is pseudocode. The record fields can be either fixed or array. Fixed data type: Character, Integeral types (Unsigned short, Unsigned long), Float Variable data type: String(pascal style C/a), Array of Unsigned short, etc Optional fields at end of record can be omitted. So if there is a record with optional (Byte, C/a(String), Float), that needs to be handled somehow. read_file_header determine endian from file_header data set up unpack data types(for big/little endian) if(endian eq 'little') { $REAL_TEMPLATE = "R<"; $U2 = "n"; $REC_HEAD = ... } else { $U2 = "v"; $REC_HEAD = ... ...etc } while(1) { read (REC_HEAD) size data from file ($rec_len,$rec_type,....) = unpack($REC_HEAD) my $rec_body = read($rec_len) # a big switch on rec_type if($rec_type == FOO) { # unpack record_body somehow for THIS rec type # FOO can be # below 4 fields must be present in record body # fixed uSHORT,uLONG,uLONG,string(C/a), # below are optional # optional Byte(Optional onwards this field), Float,String my @data = unpack(" $uSHORT $uLONG $uLONG C/a",$rec_body); my $consumed_length = 10 + length($data[-1]) + 1; # ushort +2* +ulong + length(C/a) if($consumed_length < $rec_len) { # optional fields present push @data, unpack("x${consumed_length} C",$rec_body); $consumed_length += 1; } if($consumed_length < $rec_len) { push @data,unpack("x${consumed_length} $Float",$data); $consumed_length += 4; # float is 4 bytes } # next optional ..etc } elsif($rec_type == BAR) { } } [download]	[reply] [d/l]
Re: Optimizing binary file parser (pack/unpack) by Marshall (Canon) on Oct 03, 2017 at 21:54 UTC
It would be helpful if you gave more specs about this file. The last time I worked with a binary file in Perl, it was to concatenate some .WAV files together. The .WAV has a header and then some number of binary bytes of data. The number of data bytes are specified in the header info. It was not necessary for me to unpack all of the data, just parts of the binary read in the header which were relevant to the size of the data that followed amoungst other params. I selected the key parts of the binary header via substr to get a ranges of bytes and used pack/unpack upon them. I think BrowserUk is on the right track here Re: Optimizing binary file parser (pack/unpack).	[reply]