Re^4: Reading binary file in perl having records of different length

I agree. I think that you should code this routine to be suspicious of the data, but also to completely rely on it. For example, I presume the first two bytes of the file should be an eyecatcher: die if they’re not. The next two bytes should decode to a plausible length ... die if they don’t. Read the specified number of bytes ... die if you can’t. The next thing that you read should either be “nothing” (end of file), or it should be an eyecatcher, rinse-and-repeat.

Notice that, in this way, “if the program runs successfully, then you can indeed assert that the file’s structure must be good. Since big files can and do become corrupt sometimes (and come from other people’s software systems), this amount of caution is not paranoia. Not at all. (In fact, in a production setting, I would have a series of .t test-files that prove, and re-prove, that all of these die calls actually work.)

There will be no harm in simply reading two bytes, then two bytes, then n bytes, and so on, letting Perl and the filesystem handle all of the buffering for you. It really doesn’t matter how big the file is.

Replies are listed 'Best First'.
Re^5: Reading binary file in perl having records of different length by jaypal (Beadle) on Jun 19, 2014 at 01:32 UTC
Thanks, great suggestion. I just wanted to digress and say that I am QA and the purpose of this script was to automate 70 odd test cases but all your suggestions point towards best practice and I really do appreciate that. I will add the necessary error checking and if I run in to any issues, I will report for your guidance.	[reply]