in reply to Add 1 to an arbitrary-length binary string

I'm not clear what your data looks like: the question title says "binary string" but the text talks about "text string". Some examples and/or a more complete definition would be helpful.

Incrementing a string representing an unsigned binary number like "001101" is easy: $value =~ s{(^|0)(1*)$}{1 . ("0") x length($2)}e.

Incrementing a string representing an unsigned decimal number like "123456" is an extension of that: $value =~ s{(^|[0-8])(9*)$}{(($1 || 0) + 1) . ("0") x length($2)}e.

There are similar approaches possible for other cases.

Hugo

Replies are listed 'Best First'.
Re^2: Add 1 to an arbitrary-length binary string
by einhverfr (Friar) on Nov 16, 2023 at 02:25 UTC
    Data comes in as a text string (utf-8, though the spec doesn't care about encoding) and needs to be incremented as a binary string.

      You talk about "text string" and "binary string", but I really don't know what those phrases mean to you.

      I'll guess that what you have is a string of octets - characters in the range 0x00 .. 0xff - and that you want to increment that string from the right-hand end as if it were a base 256 number. If that is the case, this will do it:

      $value =~ s{(^|[^\xff])(\xff*)$}{ chr(ord($1) + 1) . ("\x00" x lengt +h($2)) }e;

      Note that if the string has characters that are not octets - ie if it has Unicode characters with codepoints greater than 255 - then this will not achieve the same thing. If that is possible, you will need to explain more precisely what the possible inputs are.

      Also, if the string consists of octets intended to represent the utf8-encoding of a character string, this can create strings that represent malformed utf8. If that would be a problem, you will need to explain more precisely what the possible inputs and valid outputs are.

        ”… characters that are not octets…”

        That confuses me a bit. What else would they be made of?

        say unpack( "U*", "a"); printf("%04X\n", unpack('W*', decode_utf8("a"))); say join " ", unpack( "U*", "😎"); printf("%04X\n", unpack('W*', decode_utf8("😎"))); __END__ 97 0061 240 159 152 142 1F60E

        I read it like this: ”a” - code point 0061 - is one octet and 😎 - code point 1F60E - is four octets long.

        «The Crux of the Biscuit is the Apostrophe»