jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I tried to read a binary file (on a linux machine) like
#! /usr/bin/perl -l use strict ; open IN,"<my_bin_file" ; binmode IN ; my $binrec ; read IN, $binrec, 52 ; print "value is ".unpack("x20 I", $binrec) ; close IN ;
It returned the wrong number. However, the same code produced the correct value on an Unix machine!

So, I would like to convert each value (most of them are doubles, floats and integers) to little endian, but from the documentation is not really clear how to this!
Any suggestions ?

Thnks
LuCa

Replies are listed 'Best First'.
Re: swap binary data from big to little Endian
by Corion (Patriarch) on Jul 30, 2007 at 11:37 UTC
Re: swap binary data from big to little Endian
by roboticus (Chancellor) on Jul 30, 2007 at 12:10 UTC

    jeanluca:

    Just to elaborate on the previous response a little bit: Some machines want the Most Significant Byte (MSB) of a word first in memory, while others want the Least Significant Byte (LSB) of a word to be first in memory. "Network Order" is a portable format that was (I think) specified in the TCP/IP RFCs so that machines of both endianness' could interoperate.

    If you use network order in your binary formats, then you can make your code interoperable with other platforms. (Amusing note: Network order just happens to be the opposite byte order of the natural x86 order. So there's an incredible amount of time spent on x86 machines swapping byte orders to chat over networks. As there are more x86 machines out there than anything else, I wonder how many cycles are spent on it. But the people who worked on the original networking stuff apparently used boxes based on other CPU architectures than x86 (such as 3B2, 68000, 32032, etc.) so they got to choose!)

    ...roboticus

Re: swap binary data from big to little Endian
by BrowserUk (Patriarch) on Jul 30, 2007 at 14:31 UTC

    You have four problems to solve.

    1. The endinness of the hardware you are currently running on.

      The endianess of your current hardware can be determined in a couple of ways.

      • use Config; print $Config{ byteorder };

        On my machine, little-endian Intel, that prints 1234, which (presumably) is used to indicate little-endianess because the smallest number is first.

        I assume that on a big-endian machine it would print 4321.

      • print unpack 'I', "\x01\x02\x03\x04";

        That is, unpack a known packed value using the platform-native 32-bit integer format 'I'.

        On a little-endian machine that will print 67305985.

        On a big-endian machine it will print 16909060.

    2. The endianess of the data you are reading.

      This is much harder unless you know, in advance, what to expect for at least one value. Either the exact value, or some indication of the range that one or more values you are unpacking is likely to fall into.

      For example, if you are unpacking an image format that might be encoded either way, amongst the first values that you are likely to unpack, is a pair of values in a known position that are the width and height of the image. If you unpack these using

      my( $width, $height ) = unpack 'I2', substr $data, $offset, 8;
      and get values like:
      print "$width : $height"; 327680 : 262144;

      Then the chances are that the data is stored in the opposite endianness to your current platform, as most people are unlikely to be dealing with 2GB images! Unpacked 'the other way', those same values comeout to be:

      printf "%u : %u\n", unpack 'V2', substr $data, $offset, 8; 1280 : 1024
      which is much more reasonable and likely. Not a definitive test, but in this case there is no substitute for knowing your data.
    3. How to 'switch the formats' if necessary.
      • If you know, or have determined that your data is in big-endian format, you can just use 'N' or 'n' to read unsigned 32-bit or 16-bit values respectively.
      • For little-endian data, use 'V' or 'v' respectively.

      Note:There are no endian-specific formats for signed integer values. They are unnecessary as once you've unpacked the values using the unsigned formats and they are in perls internal format, then you can use them as whichever (signed or unsigned) is appropriate and perl will (mostly) DWIM.

    4. And that brings us to the biggy: floating point values.

      There is no 'network format' for floats and doubles. Not even IEEE FP format reals.

      If you are lucky, and the code that wrote them was another perl script, then just using 'd' or 'f' for 64-bit and 32-bit reals will just work and you're laughing.

      If it doesn't, then you have problems.

      • One possibility is that the source machine uses IEEE format reals, but they are stored in the opposite byte order to that of your machine.

        In this case, the 'fix' is fairly simple. If you use scalar reverse on the data before unpacking it, it will do the right thing:

        ## Assume the first 8 bytes are a double in the 'other' endianess. ## And the next 4 a float. my $data = ...; my $doublePackedOtherEndian = substr $data, 0, 8; my $floatPackedOtherEndian = substr $data, 9, 4; ## Now reverse the bytes my $doublePackMyEndian = reverse $doublePackedOtherEndian; my $floatPackedMyEndian = reverse my $floatPackedOtherEndian; ## And unpack my $double = unpack 'd', $doublePackMyEndian; my $float = unpack 'f', $floatPackedMyEndian;

        Note. This only works for hardware where the endianess involves swapping byte order only. There are (or have been) hardware where bit-order is also involved, but mostly you are unlikely to encounter such hardware these days. It also only works for platforms that use IEEE floating point representation. See below.

        Note also that this scalar reverse trick can also be used to switch the byte order of 32-bit and 16-bit integers, and it is sometimes conventient to use it for this in preference to pack 'V', unpack 'N', $packedU32; and similar.

      • Another possibility is that the hardware producing the data uses an entirely different floating point representation to your machine.

        There are myriad possibilities and variations here. Your machine might be non-IEEE, and the source machine IEEE.

        Or vice-versa.

        Or both use different non-IEEE.

        Or one might be outputting 80-bit reals. Or 128-bit reals.

        Or any combination of the above.

        They can all be dealt with if you know what these formats are and can get a good spec on them, but it's not easy.

    Hope that helps some.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks, BrowserUk. Two years later and that remains to be some very helpful information.
Re: swap binary data from big to little Endian
by jeanluca (Deacon) on Jul 30, 2007 at 13:17 UTC
    I have to add the I've always worked with ASCII strings, so pack and unpack are still magic for me, but I'll keep trying!

    No I've tried to unpack the following header:
    Position(bytes) Datatype 0 - 7 long 8 - 15 long 16 - 23 long 24 - 31 long 32 - 39 long 40 - 43 int 44 - 51 long
    I checked with 'od -l' (on a Unix system):
    0000000 0 2000 0 200 0000020 0 455800 0 4 0000040 0 8 2000 275 0000060 281946600 167772160 0000065
    I only know that the 6th value is 2000, which is now the 11th value, so I get the impression that it is expected that a long is only 4 bytes, does that make any sense ?

    Now, when I do (on Linux)
    my @vals = unpack("N*", $binrec ) ; print "@vals" ;
    Output
    0 2000 0 200 0 455800 0 4 0 8 2000 275 281946600
    I get the same values as od except for the last value!
    The last thing that confuses me is how to convert a singed/unsiged integer from big to little endian ?

    LuCa