Dwood has asked for the wisdom of the Perl Monks concerning the following question:

This is a basic, and nub question with perl however how do I open a file when I know the C++ structs of that file?

I have all of the data types that the file encompasses but to my knowledge there is no method for defining structs based on a value's size because it's not how perl works...

I need some perl wisdom!

Replies are listed 'Best First'.
Re: Opening certain files
by Corion (Patriarch) on Nov 08, 2010 at 21:45 UTC

    Basically, you read a number of bytes from the file, and then unpack the values from the data read. Knowing how many bytes to read involves manual computation, as Perl doesn't know, what C compiler/struct alignment were used. If you want total overkill, there is Convert::Binary::C, which can talk to your C compiler and ask it.

Re: Opening certain files
by ikegami (Patriarch) on Nov 08, 2010 at 21:54 UTC

    It's odd that you talk of the file being defined in terms of C++ structs, since I think C++ doesn't define the memory layout of structs.

    There is Convert::Binary::C, but it's usually simpler just to use unpack. You can read blocks of the appropriate size using read or by setting $/ appropriately with readline (aka <>).

    (Too slow!)

      Would the readline approach still work if it was a packed binary file, with fixed size records ("structs") with no end-of-record indicator?

      In this case, wouldn't it be necessary to binmode the filehandle and read the required number of bytes, before unpacking?

        Would the readline approach still work if it was a packed binary file, with fixed size records ("structs") with no end-of-record indicator?

        I said it would. Did you got read the documentation to which I linked?

        wouldn't it be necessary to binmode the filehandle

        Yes, no matter which method you use.

Re: Opening certain files
by cdarke (Prior) on Nov 09, 2010 at 08:59 UTC
    To expand on ikegami's comment, you should be careful of packing. You might know this, and apologies if you do, but members of a struct might have pad bytes added by the compiler to force those members onto boundaries, depending on the type. For example:
    struct mystruct { char a; int b; char c; };
    would be 12 bytes on many 32-bit compilers, adding 3 bytes after each char to align the int on a word bounday, and to make the struct a whole number of words. You have to be aware of the padding which can vary between compilers, command-line options, and pragmas (some compilers support #pragma packed).

    If it was me then I would write a wrapper in XS to read the C++ output, but then I am a sucker for making work for myself.

      I certainly would consider that approach in, for example, a module that might be deployed on different computers.   If you know that the file layout is (and will forever remain) stable, as will the source of that data file, then you might reasonably get away with using pack/unpack, which of course is designed expressly for this sort of thing.

      Any such program should aggressively check for errors in the input, particularly any which imply that the unpack-information does not coincide with the data.   Realistically, only the computer itself can guard against “garbage in, garbage out stay forever” in this case.   The input program is the entire system’s first line of defense, and so it should treat every record as “Fubar until proven otherwise.”   (The “inefficiency” (sic) of doing this should be of no concern whatever.)

Re: Opening certain files
by tod222 (Pilgrim) on Nov 09, 2010 at 05:01 UTC

    If you know the layout of the data types in bits, bytes, and words then it's a simple matter of unpacking the data once you've read it in.

    See the unpack function and the tutorial on pack and unpack.