Opening certain files

Dwood has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Opening certain files by Corion (Patriarch) on Nov 08, 2010 at 21:45 UTC
Basically, you read a number of bytes from the file, and then unpack the values from the data read. Knowing how many bytes to read involves manual computation, as Perl doesn't know, what C compiler/struct alignment were used. If you want total overkill, there is Convert::Binary::C, which can talk to your C compiler and ask it.	[reply]
Re: Opening certain files by ikegami (Patriarch) on Nov 08, 2010 at 21:54 UTC
It's odd that you talk of the file being defined in terms of C++ structs, since I think C++ doesn't define the memory layout of structs. There is Convert::Binary::C, but it's usually simpler just to use `unpack`. You can read blocks of the appropriate size using `read` or by setting `$/` appropriately with `readline` (aka `<>`). (Too slow!)	[reply] [d/l]
Re^2: Opening certain files by mjscott2702 (Pilgrim) on Nov 09, 2010 at 08:44 UTC
Would the readline approach still work if it was a packed binary file, with fixed size records ("structs") with no end-of-record indicator? In this case, wouldn't it be necessary to binmode the filehandle and read the required number of bytes, before unpacking?	[reply]
Re^3: Opening certain files by ikegami (Patriarch) on Nov 09, 2010 at 17:08 UTC
Would the readline approach still work if it was a packed binary file, with fixed size records ("structs") with no end-of-record indicator? I said it would. Did you got read the documentation to which I linked? wouldn't it be necessary to binmode the filehandle Yes, no matter which method you use.	[reply]
Re: Opening certain files by cdarke (Prior) on Nov 09, 2010 at 08:59 UTC
To expand on ikegami's comment, you should be careful of packing. You might know this, and apologies if you do, but members of a struct might have pad bytes added by the compiler to force those members onto boundaries, depending on the type. For example: `struct mystruct { char a; int b; char c; };` [download] would be 12 bytes on many 32-bit compilers, adding 3 bytes after each char to align the int on a word bounday, and to make the struct a whole number of words. You have to be aware of the padding which can vary between compilers, command-line options, and pragmas (some compilers support `#pragma packed`). If it was me then I would write a wrapper in XS to read the C++ output, but then I am a sucker for making work for myself.	[reply] [d/l] [select]
Re^2: Opening certain files by locked_user sundialsvc4 (Abbot) on Nov 09, 2010 at 13:54 UTC
I certainly would consider that approach in, for example, a module that might be deployed on different computers. If you know that the file layout is (and will forever remain) stable, as will the source of that data file, then you might reasonably get away with using `pack/unpack`, which of course is designed expressly for this sort of thing. Any such program should aggressively check for errors in the input, particularly any which imply that the unpack-information does not coincide with the data. Realistically, only the computer itself can guard against “garbage in, garbage ~~out~~ stay forever” in this case. The input program is the entire system’s first line of defense, and so it should treat every record as “`Fubar` until proven otherwise.” (The “inefficiency” (sic) of doing this should be of no concern whatever.)
Re: Opening certain files by tod222 (Pilgrim) on Nov 09, 2010 at 05:01 UTC
If you know the layout of the data types in bits, bytes, and words then it's a simple matter of unpacking the data once you've read it in. See the unpack function and the tutorial on pack and unpack.	[reply]