swap binary data from big to little Endian

jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: swap binary data from big to little Endian by Corion (Patriarch) on Jul 30, 2007 at 11:37 UTC
See the pack templates, especially `V`, `N` and `F` and `D`. Also, F and D are hardware specific and will not port between architectures. BrowserUk wrote IEEE 754 80-bit Extended double (long double) to 64-bit double unpack and IEEE 754 64-bit double to 80-bit Extended double (long double) pack (Updated). to manually unpack (IEEE) floating point values. Internally, Perl does not know about endianness, so you only have to worry converting on input and output of such values.	[reply] [d/l] [select]
Re: swap binary data from big to little Endian by roboticus (Chancellor) on Jul 30, 2007 at 12:10 UTC
jeanluca: Just to elaborate on the previous response a little bit: Some machines want the Most Significant Byte (MSB) of a word first in memory, while others want the Least Significant Byte (LSB) of a word to be first in memory. "Network Order" is a portable format that was (I think) specified in the TCP/IP RFCs so that machines of both endianness' could interoperate. If you use network order in your binary formats, then you can make your code interoperable with other platforms. (Amusing note: Network order just happens to be the opposite byte order of the natural x86 order. So there's an incredible amount of time spent on x86 machines swapping byte orders to chat over networks. As there are more x86 machines out there than anything else, I wonder how many cycles are spent on it. But the people who worked on the original networking stuff apparently used boxes based on other CPU architectures than x86 (such as 3B2, 68000, 32032, etc.) so they got to choose!) ...roboticus	[reply]
Re: swap binary data from big to little Endian by BrowserUk (Patriarch) on Jul 30, 2007 at 14:31 UTC
You have four problems to solve. The endinness of the hardware you are currently running on. The endianess of your current hardware can be determined in a couple of ways. `use Config; print $Config{ byteorder };` On my machine, little-endian Intel, that prints `1234`, which (presumably) is used to indicate little-endianess because the smallest number is first. I assume that on a big-endian machine it would print `4321`. `print unpack 'I', "\x01\x02\x03\x04";` That is, unpack a known packed value using the platform-native 32-bit integer format 'I'. On a little-endian machine that will print `67305985`. On a big-endian machine it will print `16909060`. The endianess of the data you are reading. This is much harder unless you know, in advance, what to expect for at least one value. Either the exact value, or some indication of the range that one or more values you are unpacking is likely to fall into. For example, if you are unpacking an image format that might be encoded either way, amongst the first values that you are likely to unpack, is a pair of values in a known position that are the width and height of the image. If you unpack these using `my( $width, $height ) = unpack 'I2', substr $data, $offset, 8;` [download] and get values like: `print "$width : $height"; 327680 : 262144;` [download] Then the chances are that the data is stored in the opposite endianness to your current platform, as most people are unlikely to be dealing with 2GB images! Unpacked 'the other way', those same values comeout to be: `printf "%u : %u\n", unpack 'V2', substr $data, $offset, 8; 1280 : 1024` [download] which is much more reasonable and likely. Not a definitive test, but in this case there is no substitute for knowing your data. How to 'switch the formats' if necessary. If you know, or have determined that your data is in big-endian format, you can just use 'N' or 'n' to read unsigned 32-bit or 16-bit values respectively. For little-endian data, use 'V' or 'v' respectively. Note:There are no endian-specific formats for signed integer values. They are unnecessary as once you've unpacked the values using the unsigned formats and they are in perls internal format, then you can use them as whichever (signed or unsigned) is appropriate and perl will (mostly) DWIM. And that brings us to the biggy: floating point values. There is no 'network format' for floats and doubles. Not even IEEE FP format reals. If you are lucky, and the code that wrote them was another perl script, then just using 'd' or 'f' for 64-bit and 32-bit reals will just work and you're laughing. If it doesn't, then you have problems. One possibility is that the source machine uses IEEE format reals, but they are stored in the opposite byte order to that of your machine. In this case, the 'fix' is fairly simple. If you use scalar reverse on the data before unpacking it, it will do the right thing: `## Assume the first 8 bytes are a double in the 'other' endianess. ## And the next 4 a float. my $data = ...; my $doublePackedOtherEndian = substr $data, 0, 8; my $floatPackedOtherEndian = substr $data, 9, 4; ## Now reverse the bytes my $doublePackMyEndian = reverse $doublePackedOtherEndian; my $floatPackedMyEndian = reverse my $floatPackedOtherEndian; ## And unpack my $double = unpack 'd', $doublePackMyEndian; my $float = unpack 'f', $floatPackedMyEndian;` [download] Note. This only works for hardware where the endianess involves swapping byte order only. There are (or have been) hardware where bit-order is also involved, but mostly you are unlikely to encounter such hardware these days. It also only works for platforms that use IEEE floating point representation. See below. Note also that this scalar reverse trick can also be used to switch the byte order of 32-bit and 16-bit integers, and it is sometimes conventient to use it for this in preference to `pack 'V', unpack 'N', $packedU32;` and similar. Another possibility is that the hardware producing the data uses an entirely different floating point representation to your machine. There are myriad possibilities and variations here. Your machine might be non-IEEE, and the source machine IEEE. Or vice-versa. Or both use different non-IEEE. Or one might be outputting 80-bit reals. Or 128-bit reals. Or any combination of the above. They can all be dealt with if you know what these formats are and can get a good spec on them, but it's not easy. Hope that helps some. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: swap binary data from big to little Endian by Anonymous Monk on Jun 24, 2009 at 20:22 UTC
Thanks, BrowserUk. Two years later and that remains to be some very helpful information.	[reply]
Re: swap binary data from big to little Endian by jeanluca (Deacon) on Jul 30, 2007 at 13:17 UTC
I have to add the I've always worked with ASCII strings, so pack and unpack are still magic for me, but I'll keep trying! No I've tried to unpack the following header: `Position(bytes) Datatype 0 - 7 long 8 - 15 long 16 - 23 long 24 - 31 long 32 - 39 long 40 - 43 int 44 - 51 long` [download] I checked with 'od -l' (on a Unix system): `0000000 0 2000 0 200 0000020 0 455800 0 4 0000040 0 8 2000 275 0000060 281946600 167772160 0000065` [download] I only know that the 6th value is 2000, which is now the 11th value, so I get the impression that it is expected that a long is only 4 bytes, does that make any sense ? Now, when I do (on Linux) `my @vals = unpack("N*", $binrec ) ; print "@vals" ;` [download] Output `0 2000 0 200 0 455800 0 4 0 8 2000 275 281946600` [download] I get the same values as od except for the last value! The last thing that confuses me is how to convert a singed/unsiged integer from big to little endian ? LuCa	[reply] [d/l] [select]