in reply to Understanding endianness of a number

In many cases, endianness is transparent to the programmer, except when explicitly converting between numbers and bytes or when you need to look at the memory directly. In that way, it's kind of like a character encoding: in your program, you work mostly with strings and characters, usually not caring how they're stored internally, and only when converting to and from streams of bytes does it become important how those more abstract notions of characters are represented as bytes (e.g. UTF-8 vs. UTF-16 vs. many more). The same way, in a Perl program, you can say my $x = 48879;, and you don't have to care how that number gets represented in memory, until you have to think about how to read or write it to a binary file or send/receive it over a data link as a series of bytes. In both cases, there are two levels of thinking here, the "more abstract" notion of numbers/characters, versus the machine-level bytes, and explicit conversion is needed between the two. The conceptual issues arise because this conversion is many times implicit instead of explicit, and so programmers don't often have to think about it.

(For the purpose of this explanation, assume byte addressable memory everywhere, and let's ignore that modern machines of course work with words of multiple bytes. Your question comes from I2C anyway, which works on the byte level.)

So assuming you want to store my $x = 48879;, or my $x = 0xBEEF; as a 16-bit unsigned integer in two bytes, there are two ways to do that: with the most significant byte 0xBE at the lower memory address, or at the higher one (or in the case of a protocol, 0xBE being transmitted first, or second).

48879 = 0xBEEF ^ ^ 0xBE = MSB LSB = 0xEF Memory Address: 0 | 1 | 2 | 3 | | | Little Endian: ... | LSB | MSB | ... | 0xEF | 0xBE | | | | Big Endian: ... | MSB | LSB | ... | 0xBE | 0xEF | # little endian $ perl -MData::Dump -e 'dd pack "S<", 0xBEEF' "\xEF\xBE" # big endian $ perl -MData::Dump -e 'dd pack "S>", 0xBEEF' "\xBE\xEF"

What can sometimes be confusing is that some diagrams of memory addresses or transmission protocols place the least significant bit on the right side of the diagram (because bytes are typically written with their most significant bit first, as in 170 == 0b10101010), but at the same time put the least significant byte lowest memory address on the left, and there are often other variations of this. In fact, if I recall correctly, sorting out this initial left-to-right/right-to-left confusion was probably one of the most important things to help make endianness "click" for me. Another thing to keep in mind is that when you write 0xBEEF in your source code, that's still a single 16-bit value, and not yet two bytes; you don't yet know how it'll be represented in memory.

To answer your two questions: Yes, you're correct. In 0x03FF, the MSB is 0x03 (the "big end") and the LSB is 0xFF (the "little end"), so the first is big endian order since you print the big end first, and the second is little endian order because you print the little end first. But just to be clear, on the other hand, $b1 and $b2 are two unconnected variables - so what you've really got there is two separate bytes, not a two-byte value stored in a certain order. (Update: I wouldn't have picked this nit if you had stored them in an array instead since the array indicies take the place of the memory addresses :-) )

I've for now ignored additional topics like bigger numbers stored as four or more bytes, where at least in theory there are more than two possible orderings, but I hope that if the principle and the 16-bit version makes sense, understanding the documentation for wider values will be easier.

(By the way, I like to shift first and then mask, i.e. $b1 = ($num >> 8) & 0xFF;, because I've been burned on a small microprocessor where the C compiler implemented the bit shift with a rotate instruction instead. I forget which processor and compiler it was though... plus I don't think Perl would run on such a uC, so it's really just a preference I've developed as a result.)

(As the AM post points out, endianness could also refer to the order in which bits of a byte are transmitted, but in my experience, I've pretty much always seen the term endianness referring only to byte order; most protocol descriptions I've read will instead explicitly state "the least/most significant bit is transmitted first/last".)

Minor updates for clarity.

Replies are listed 'Best First'.
Re^2: Understanding endianness of a number
by stevieb (Canon) on Jul 23, 2017 at 19:48 UTC

    Thanks a boatload, haukex!

    After I wrote my question here, I continued doing research. I was comparing endianness with the year long work I've been doing with hardware registers, so I could not quite grasp things until it did click (then you re-affirmed) that endianness (mostly) refers to byte storage, not (typically) bit storage.

    By the time I wrote my post here, I thought I had it, but just wanted clarification. You covered it perfectly, even down to sometimes bits, and how some 32 or 64 bit ints may even have the middle bytes reversed. I think I'll leave that for another day until I run into it, if ever. :)

    I also agree with your shift first then perform and/or-ing. I ran into that while playing around during this whole lesson. Not only can it catch edge case problems, I feel it's quicker to digest/comprehend when glancing through code. Thanks for that tidbit too.

    The endianness issue cropped up because I was getting weird (read: backwards) results when trying to read two bytes as a single 16-bit int from an Arduino over I2C. This case is present in this post that I made, within the Arduino sketch portion of it, within the __read_analog() function.

    Thanks again,

    -stevieb

      Glad to help!

      endianness (mostly) refers to byte storage, not (typically) bit storage

      Yeah, I think there's some ambiguity there, e.g. whether LSB and MSB mean bytes or bits - Wikipedia makes a good point about context:

      LSB can also stand for least significant byte. ... If the abbreviation's meaning least significant byte isn't obvious from context, it should be stated explicitly to avoid confusion with least significant bit.

      Although the entry on Endianness does differentiate a bit more clearly:

      The order of bits within a byte or word can also have endianness (as discussed later); however, a byte is typically handled as a numerical value or character symbol and so bit sequence order is obviated.

      And then goes on talk about "Bit endianness" in its own section.

        Yep, definitely gleaned that from the article. It just took a few iterations before I fully separated the two. For years, ever since I began coding around ~Y2K, I've heard and saw "endian" thrown around quite a bit, but never had the need to research what it actually was.

        All of it is quite fascinating, and I'm enjoying everything I've been working on dealing with how electronics store and use bits and bytes.

      how some 32 or 64 bit ints may even have the middle bytes reversed
      I don't know of a thing like that. How the data is stored in memory has a simple algorithm. See this link which also gives a list of "endianess" for various file formats.

        Thanks Marshall for the reply.

        I phrased it incorrectly, as in this regard I definitely have a juvenile understanding of the terminology. Also, I haven't done much digging (and testing) beyond a 16-bit int at this point. From the mentioned Wikipedia article, what I meant was:

        Mixed forms also exist, for instance the ordering of bytes in a 16-bit word may differ from the ordering of 16-bit words within a 32-bit word. Such cases are sometimes referred to as mixed-endian or middle-endian. There are also some bi-endian processors that operate in either little-endian or big-endian mode.

        Cheers for the link. The more info I have to expand and hone my understanding, the better it is for me.