in reply to Re^2: Add 1 to an arbitrary-length binary string
in thread Add 1 to an arbitrary-length binary string

You talk about "text string" and "binary string", but I really don't know what those phrases mean to you.

I'll guess that what you have is a string of octets - characters in the range 0x00 .. 0xff - and that you want to increment that string from the right-hand end as if it were a base 256 number. If that is the case, this will do it:

$value =~ s{(^|[^\xff])(\xff*)$}{ chr(ord($1) + 1) . ("\x00" x lengt +h($2)) }e;

Note that if the string has characters that are not octets - ie if it has Unicode characters with codepoints greater than 255 - then this will not achieve the same thing. If that is possible, you will need to explain more precisely what the possible inputs are.

Also, if the string consists of octets intended to represent the utf8-encoding of a character string, this can create strings that represent malformed utf8. If that would be a problem, you will need to explain more precisely what the possible inputs and valid outputs are.

Replies are listed 'Best First'.
Re^4: Add 1 to an arbitrary-length binary string
by karlgoethebier (Abbot) on Nov 16, 2023 at 17:08 UTC
    ”… characters that are not octets…”

    That confuses me a bit. What else would they be made of?

    say unpack( "U*", "a"); printf("%04X\n", unpack('W*', decode_utf8("a"))); say join " ", unpack( "U*", "😎"); printf("%04X\n", unpack('W*', decode_utf8("😎"))); __END__ 97 0061 240 159 152 142 1F60E

    I read it like this: ”a” - code point 0061 - is one octet and 😎 - code point 1F60E - is four octets long.

    «The Crux of the Biscuit is the Apostrophe»

      Exactly as I said in the immediately following part of that sentence: Unicode characters with codepoints greater than 255.

      The string "\x{61}\x{1f60e}" has a length of two, it consists of two characters. Its internal representation happens to consist of 5 octets, but any time you have to care about the internal representation is an example of the Unicode bug.

      The string "\x{61}\x{ff}" may be stored internally as either 2 or 3 octets; however it also has a length of two, consists of two characters, and will be incremented by my example code to the string "\x{62}\x{0}", regardless of the internal representation.