Re: swap binary data from big to little Endian

You have four problems to solve.

The endinness of the hardware you are currently running on.
The endianess of your current hardware can be determined in a couple of ways.
- use Config; print $Config{ byteorder };
  On my machine, little-endian Intel, that prints 1234, which (presumably) is used to indicate little-endianess because the smallest number is first.
  I assume that on a big-endian machine it would print 4321.
- print unpack 'I', "\x01\x02\x03\x04";
  That is, unpack a known packed value using the platform-native 32-bit integer format 'I'.
  On a little-endian machine that will print 67305985.
  On a big-endian machine it will print 16909060.
The endianess of the data you are reading.
This is much harder unless you know, in advance, what to expect for at least one value. Either the exact value, or some indication of the range that one or more values you are unpacking is likely to fall into.
For example, if you are unpacking an image format that might be encoded either way, amongst the first values that you are likely to unpack, is a pair of values in a known position that are the width and height of the image. If you unpack these using
```
my( $width, $height ) = unpack 'I2', substr $data, $offset, 8;
[download]
```
and get values like:
```
print "$width : $height";
327680 : 262144;
[download]
```
Then the chances are that the data is stored in the opposite endianness to your current platform, as most people are unlikely to be dealing with 2GB images! Unpacked 'the other way', those same values comeout to be:
```
printf "%u : %u\n", unpack 'V2', substr $data, $offset, 8;
1280 : 1024
[download]
```
which is much more reasonable and likely. Not a definitive test, but in this case there is no substitute for knowing your data.
How to 'switch the formats' if necessary.
- If you know, or have determined that your data is in big-endian format, you can just use 'N' or 'n' to read unsigned 32-bit or 16-bit values respectively.
- For little-endian data, use 'V' or 'v' respectively.
Note:There are no endian-specific formats for signed integer values. They are unnecessary as once you've unpacked the values using the unsigned formats and they are in perls internal format, then you can use them as whichever (signed or unsigned) is appropriate and perl will (mostly) DWIM.
And that brings us to the biggy: floating point values.
There is no 'network format' for floats and doubles. Not even IEEE FP format reals.
If you are lucky, and the code that wrote them was another perl script, then just using 'd' or 'f' for 64-bit and 32-bit reals will just work and you're laughing.
If it doesn't, then you have problems.
- One possibility is that the source machine uses IEEE format reals, but they are stored in the opposite byte order to that of your machine.
  In this case, the 'fix' is fairly simple. If you use scalar reverse on the data before unpacking it, it will do the right thing:
```
## Assume the first 8 bytes are a double in the 'other' endianess. 
##  And the next 4 a float.
my $data = ...; 

my $doublePackedOtherEndian = substr $data, 0, 8;
my $floatPackedOtherEndian  = substr $data, 9, 4;

## Now reverse the bytes
my $doublePackMyEndian  = reverse $doublePackedOtherEndian;
my $floatPackedMyEndian = reverse my $floatPackedOtherEndian;

## And unpack
my $double = unpack 'd', $doublePackMyEndian;
my $float  = unpack 'f', $floatPackedMyEndian;
[download]
```
  Note. This only works for hardware where the endianess involves swapping byte order only. There are (or have been) hardware where bit-order is also involved, but mostly you are unlikely to encounter such hardware these days. It also only works for platforms that use IEEE floating point representation. See below.
  Note also that this scalar reverse trick can also be used to switch the byte order of 32-bit and 16-bit integers, and it is sometimes conventient to use it for this in preference to pack 'V', unpack 'N', $packedU32; and similar.
- Another possibility is that the hardware producing the data uses an entirely different floating point representation to your machine.
  There are myriad possibilities and variations here. Your machine might be non-IEEE, and the source machine IEEE.
  Or vice-versa.
  Or both use different non-IEEE.
  Or one might be outputting 80-bit reals. Or 128-bit reals.
  Or any combination of the above.
  They can all be dealt with if you know what these formats are and can get a good spec on them, but it's not easy.

Hope that helps some.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re: swap binary data from big to little Endian Select or Download Code

Replies are listed 'Best First'.
Re^2: swap binary data from big to little Endian by Anonymous Monk on Jun 24, 2009 at 20:22 UTC
Thanks, BrowserUk. Two years later and that remains to be some very helpful information.	[reply]