einhverfr has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks

I am trying to do some more complex etcd tasks and one of them is listing all keys by a prefix. The current API from Net::Etcd only allows for start and end ranges, and they suggest incrementing the range start by one bit to do a prefix search. However, since the keys are typically text strings, this is a bit of a problem.

I found advice suggesting casting strings to unsigned longs and incrementing and then casing back. Unfortunately these are arbitrary length strings and so I have no guarantee that this will be enough space.

Is there some way of incrementing a string by one bit that I haven't seen? Maybe a Perl module or some way of incrementing a base-64-encoded string or something?

  • Comment on Add 1 to an arbitrary-length binary string

Replies are listed 'Best First'.
Re: Add 1 to an arbitrary-length binary string
by syphilis (Archbishop) on Nov 15, 2023 at 10:27 UTC
    found advice suggesting casting strings to unsigned longs and incrementing and then casing back. Unfortunately these are arbitrary length strings and so I have no guarantee that this will be enough space

    Sounds like a good opportunity to make use of perl's Math::BigInt module:
    use strict; use warnings; use Math::BigInt; my $string = '12349' x 10; my $x = Math::BigInt->new($string); print $x, "\n"; # $x is a Math::BigInt object $x++; print "$x", "\n"; # "$x" is a string __END__ Outputs: 12349123491234912349123491234912349123491234912349 12349123491234912349123491234912349123491234912350
    Cheers,
    Rob

      Is there any way to turn a string of arbitrary length text into an arbitrary BigInt and back? A big part of the problem is that the prefix could be "/Some/Arbitrarily/long/string"

      Edit: I found Base::Math::Convert but it looks really poorly maintained so I am hoping there is something a little better.

      Worst case scenario, I could build the logic into the module I am building, but hoping to avoid that.

        Is there any way to turn a string of arbitrary length text into an arbitrary BigInt and back?

        Not that I know of - though I don't usually use Math::BigInt.
        Is it guaranteed that the "arbitrary length text" will contain only valid base 64 characters ?

        Math::GMPz (plug) will certainly handle such text, even if it contains invalid base 64 characters - though Math::GMPz isn't included in perl, and depends upon the gmp C library:
        use strict; use warnings; use Math::GMPz qw(:mpz); my ($order, $size, $endian, $nails) = (1, 1, 0, 0); my $s = 'Som3/Arbitr@ri!y/long/string'; my $z = Math::GMPz->new(); Rmpz_import($z, length($s), $order, $size, $endian, $nails, $s); print $z, "\n"; $z++; print "$z\n"; my $incremented_s = Rmpz_export( $order, $size, $endian, $nails, $z); print "$s\n"; print "$incremented_s\n"; # Next retrieve original string: $z--; my $orig_s = Rmpz_export( $order, $size, $endian, $nails, $z); print "$orig_s\n"; __END__ Outputs: 8786758437493627904416651010636024986357833879283932790893327248999 8786758437493627904416651010636024986357833879283932790893327249000 Som3/Arbitr@ri!y/long/string Som3/Arbitr@ri!y/long/strinh Som3/Arbitr@ri!y/long/string
        But perhaps someone else can provide a solution that better suits your needs.

        Cheers,
        Rob.

        I don't understand what is "one bit larger than the given key". Isn't it

        "/Some/Arbitrarily/long/string" # the key "0Some/Arbitrarily/long/string" # incremented

        ? i.e. next ASCII character, unless for "\xff", then put "\0" and carry over?

Re: Add 1 to an arbitrary-length binary string
by hv (Prior) on Nov 15, 2023 at 13:25 UTC

    I'm not clear what your data looks like: the question title says "binary string" but the text talks about "text string". Some examples and/or a more complete definition would be helpful.

    Incrementing a string representing an unsigned binary number like "001101" is easy: $value =~ s{(^|0)(1*)$}{1 . ("0") x length($2)}e.

    Incrementing a string representing an unsigned decimal number like "123456" is an extension of that: $value =~ s{(^|[0-8])(9*)$}{(($1 || 0) + 1) . ("0") x length($2)}e.

    There are similar approaches possible for other cases.

    Hugo

      Data comes in as a text string (utf-8, though the spec doesn't care about encoding) and needs to be incremented as a binary string.

        You talk about "text string" and "binary string", but I really don't know what those phrases mean to you.

        I'll guess that what you have is a string of octets - characters in the range 0x00 .. 0xff - and that you want to increment that string from the right-hand end as if it were a base 256 number. If that is the case, this will do it:

        $value =~ s{(^|[^\xff])(\xff*)$}{ chr(ord($1) + 1) . ("\x00" x lengt +h($2)) }e;

        Note that if the string has characters that are not octets - ie if it has Unicode characters with codepoints greater than 255 - then this will not achieve the same thing. If that is possible, you will need to explain more precisely what the possible inputs are.

        Also, if the string consists of octets intended to represent the utf8-encoding of a character string, this can create strings that represent malformed utf8. If that would be a problem, you will need to explain more precisely what the possible inputs and valid outputs are.

Re: Add 1 to an arbitrary-length binary string
by tybalt89 (Monsignor) on Nov 15, 2023 at 21:56 UTC

    Here's a guess ...

    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11155633 use warnings; use Data::Dump 'dd'; for ( 'aa', "a\xff", 'frequency/is/measured/in/hertz' ) { dd 'in', $_; my $onelarger = s/([^\xff])\xff*\z/ $1 =~ tr||\x01-\xff|cr /er; dd 'out', $onelarger; }

    Outputs:

    ("in", "aa") ("out", "ab") ("in", "a\xFF") ("out", "b") ("in", "frequency/is/measured/in/hertz") ("out", "frequency/is/measured/in/hert{")

      61FF + 1 = 6200, not 62. (Second test)

        That would be true if all strings were hex numbers, but "/Some/Arbitrarily/long/string" (see Re^2: Add 1 to an arbitrary-length binary string) doesn't look like a hex number to me...

        Shouldn't  "61FF" + 1 = "61FG" ?

        If a set of test cases were provided, that could clarify the matter.

        actually since this is a range bound, the truncation is fine
      Thank you so much. This certainly gets me started.
Re: Add 1 to an arbitrary-length binary string
by ForgotPasswordAgain (Vicar) on Nov 15, 2023 at 17:01 UTC

    Note: I haven't used Net::Etcd or etcd before, so just speculating.

    I have a feeling you won't be able to do that using the Perl module. From what I can see from the docs for range_end, plus this clarification of what it means (their examples of "aa"+1 == "ab", "a\xff"+1 == "b"), plus how it's implemented in Perl by calling encode_base64 on whatever you pass in; assuming it's that encoded data that's passed to the underlying go code, it seems like you won't ever be able to "increment by 1", since the encoded thing will be..."randomized".

    But maybe I'm misreading things..

    EDIT: thinking about it more, probably the encoding/decoding is just to pass it over JSON, so the encoding part was wrong. So maybe if you can figure out how to translate "a\xff"+1 == "b" to Perl. I'm not sure exactly what that means, but it seems to take the bit representation of the string and add 1 to it? (Am I just repeating your original question? Heh...)

      So maybe if you can figure out how to translate "a\xff"+1 == "b" to Perl.

      Well ... this is precisely what Math::GMPz's Rmpz_import() and Rmpz_export() do. In the second of the 2 scripts I provided earlier, the string "aa" gets incremented to "ab", and the string "a\xff" gets incremented to "b".
      This suggests that the underlying gmp functions (mpz_import and mpz_export), wrapped by Rmpz_import/Rmpz_export, might well be providing the desired behaviour.

      Cheers,
      Rob
Re: Add 1 to an arbitrary-length binary string
by BillKSmith (Monsignor) on Nov 16, 2023 at 20:43 UTC
    It seems that you are trying to make a minimal range by setting the range end to one bit higher than the range start. Incrementing the last character is easy.
    use strict; use warnings; my $str = 'arbitrary length string.xxx'; substr($str, -1, 1) = chr(ord(substr($str, -1, 1))+1); print $str;
    This increments the Unicode code-point rather than its utf8 encoding that your integer arithmetic would increment. In the cases where it makes a difference, I suspect that this is what the author of your document had in mind. This will fail if your strings are restricted to ASCII Characters and the last character of the string is an ASCII DEL (U+7F).
    Bill