Re^3: Reading binary file performance

Have I fundamentally misunderstood anything

Yes, the three main tricks , but they're not exactly easy to spot unless you're familiar with them :)
1) one single correct unpack call with a clever template is faster than everything else
2) slurping the entire file into one large string is faster than reading byte by byte (its how hard disks work)
3) substr-ing accross a string is faster than chopping a string (or copying a string then chopping it ...
aliasing and/or pass-by-reference is faster than copying

What you did is replace unpack with substr+unpack -- two operations with one -- this will be slower

one of the slow things about your original program is using oct/hex+unpack -- unpack can do most things by itself , see Re: ID3v2 TAG Footer Reading goes wrong (more subs),Re: hex to binary ( UInt32 / Int32 )

reducing the number of calls speeds things up

Comment on Re^3: Reading binary file performance

Replies are listed 'Best First'.
Re^4: Reading binary file performance by oneill (Initiate) on Mar 27, 2014 at 13:44 UTC
Thank you superdoc, just a few comments regarding your answers 1) It'll be not possible for me to provide a single template as the messages are dynamic. And so require different handling dependent on what is specified. 2) I can't read an entire file unfortunately since these files can be gigabytes large and also I'll be reading from a pipe. 3) Yes - I done some further research on this and it seems perl just points to the string rather than copying it into memory. So pass by reference actually takes longer. I also don't believe that I chopped it at all, I just increased a counter. I'll try and see if I can make a dynamic template to reduce the number of calls and take a look at these other threads to see what I can do to remove unnecessary logic. Will provide an update...	[reply]

Replies are listed 'Best First'.

Re^4: Reading binary file performance
by oneill (Initiate) on Mar 27, 2014 at 13:44 UTC

Thank you superdoc, just a few comments regarding your answers

1) It'll be not possible for me to provide a single template as the messages are dynamic. And so require different handling dependent on what is specified.
2) I can't read an entire file unfortunately since these files can be gigabytes large and also I'll be reading from a pipe.
3) Yes - I done some further research on this and it seems perl just points to the string rather than copying it into memory. So pass by reference actually takes longer. I also don't believe that I chopped it at all, I just increased a counter.

I'll try and see if I can make a dynamic template to reduce the number of calls and take a look at these other threads to see what I can do to remove unnecessary logic.

Will provide an update...

[reply]