hv has asked for the wisdom of the Perl Monks concerning the following question:

As part of my attempt to find new or improved values for a family of sequences such as https://oeis.org/A292580, I want to know whether a trial assignment of the p-adic order of some or all of a sequence of consecutive integers is consistent for a given prime p.

Eg given a sequence of 5 integers and the prime 2, they could have p-adic orders [0, 1, 0, 2, 0] if the sequence starts at 1, or [1, 0, 5, 0, 1] starting at 30; but they could never look like [2, ?, ?, ?, 2] whatever the starting point: if two multiples of 4 are separated by 4, one of the two must be divisible at least by 8. Similarly [1, 0, 0, 1, 0, 0, 1] with p=3 is not possible.

I've written the function below to answer this question - taking the prime p and an arrayref of known (assigned) p-adic orders with undef representing "unknown", and returning a TRUE value if the set is valid. However it feels very expensive and slow (I expect to call this billions of times), and I feel I must be missing a trick that would give the answer more directly, maybe even with a single pass over the values.

Note that I'm currently typically calling this only with sequences of up to 32 integers, but I'd like as general a solution as possible. However if there's a big win from special-casing p=2 I'd probably take it.

Suggestions for a better algorithm would be much appreciated.

Hugo

sub valid_set { my($p, $known) = @_; my(%seenzero, %seenmore); for my $index (0 .. $#$known) { # look only at the defined values my $power = $known->[$index] // next; if ($power > 0) { # we've seen a power > 0, record its value modulo p ++$seenmore{$index % $p}; } else { # we've seen a power = 0, record its value modulo p ++$seenzero{$index % $p}; } } # the values _not_ divisible by p can be distributed across at mos +t # p - 1 distinct values modulo p return 0 if keys(%seenzero) >= $p; # [update] bugfix: the values that _are_ divisible by p must not have # the same value modulo p as those that are not. Thanks LanX. :) return 0 if grep $seenzero[$_], keys %seenmore; # [end update] # the values that _are_ divisible by p must all have the same valu +e # modulo p return 0 if keys(%seenmore) > 1; my($mod, $seen) = each %seenmore; # if there were no values divisible by p, or exactly one such, we' +re done return 1 if ($seen // 0) <= 1; # else reduce the list to every p'th element, and recurse my @higher; for (my $i = $mod; $i < @$known; $i += $p) { push @higher, defined($known->[$i]) ? $known->[$i] - 1 : undef +; } return valid_set($p, \@higher); } # test the examples: expect [yes, yes, no, no] print valid_set(@$_) ? "yes\n" : "no\n" for ( [ 2, [0, 1, 0, 2, 0]], [ 2, [1, 0, 5, 0, 1]], [ 2, [2, undef, undef, undef, 2]], [ 3, [1, 0, 0, 1, 0, 0, 1]], );

Replies are listed 'Best First'.
Re: checking a set of numbers for consistency mod p
by LanX (Saint) on Apr 15, 2022 at 23:03 UTC
    I came up with a "new" idea, but just realized it's very similar to your original algorithm

    Maybe others will profit from the explanation:

    let's take p=3 and so any input which is a 32-range will be cut out from this circular array (with X >= 4)

    0010010020010010020010010030010010020010010020010010030010010020010010 +0200100100X (repeat from start)

    plus any entry can be undef="?"

    If we arrange the input in p columns we see, that no matter were our sub-range starts

    a) p-1 columns MUST only be 0 (or undef)

    b) exactly 1 column MUST only have values >0 (or undef)

    for instance

    Level=0

    ??? 01? 0?0 0?0 010 ??0 010 01? 050

    if we take the non-zero column ...

    ?1??1?115

    and decrement all defined entries ...

    ?0??0?004

    we get the input for the next recursion which must hold the same criteria

    Level=1

    ?0? ?0? 004

    and so on

    Level=2

    ??3

    Level=3

    2

    You already (almost) implemented it, and now I understand it too. The way you use those hashes irritated me a lot.

    If you wanna speed it up you could

    • avoid the recursion, just iterate over the same array starting at the right column with step plevel and calculate $power -= $level instead of decrementing a new input array
    • don't use hashes they are slow and the logic is glitchy. Arrays @seenzero and @seenmore just need to be complementary, like (1,0,1) and (0,1,0) in the first level demonstrated. Alternatively use bitstrings and xor them to equal "111".
    Talking about glitchy, consider this input

    01? ?0?

    obviously is the middle column wrong having both 0 and 1, but AFAIS will your implementation accept that and recurs to the next level. (Or am I misreading?)

    Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column°.

    In a second pass make sure that - for defined parts - this column has only non-zeros and all other columns are always zero. Like that you don't need any hashes or other flags and you can bail out immediately on fail.

    HTH! :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"

      I came up with a "new" idea, but just realized it's very similar to your original algorithm

      I think it's the same as my original algorithm. :)

      don't use hashes they are slow and the logic is glitchy

      I guess the main reason to use a hash was that the number of entries gets counted for me: I suspect iterating over an array to count the entries could easily cost all the time saved in assignment. Not sure what you mean by "the logic is glitchy", unless you're referring to the bug you describe later (which I contend is a simple logic error, and not related to the choice of container).

      I agree bitstrings could be faster for seenzero (though a lot more fiddly, especially if I want a fast bit count) - I'm sure that's what I'd have chosen if writing this in C or assembler. For %seenmore it would probably make sense just to record $mod and $seen (the count) directly.

      Similarly I agree that the recursion could be avoided. These are all micro-optimizations though, and the question was about a different algorithm - which I think you've given me, piece by piece, in your other posts, I just need to get around to putting together an implementation.

      Talking about glitchy, consider [the] input 3, [0, 1, undef, undef, 0, undef]

      Ooh yes, pretty sure that's a bug in the implementation of my OP: it should return false if any modular value shows up in both %seenzero and %seenmore. I've updated the OP with a simple but inefficient fix for that, thanks!

      Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column. [...] in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"

      My original algorithm copes well with that pathological case, by counting the number of distinct modular values seen for the zeroes. I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

        > I think it's the same as my original algorithm. :)

        Well your's is buggy! ;-)

        This count on hashes looks weird and is obviously error prone.

        If I were you I would have started with an test suite with automated cases. like [3, [0,0,1,0,0,1,0,0,X]] with X from 2..3 multiplied with all 2^9 combinations of ? multiplied with 9 rotations.

        These are just 2*512*9 <= 9216 positive cases (some will duplicate)

        IMHO all other arrays of length 9 and max entry 3, which can't be found in this set must be negative.

        You could use a random function to generate them.

        > I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

        I said first pass search for a power >0 to identify the main column.

        In the second pass do the validation.

        All defined entries in the main column must be >0 but in all other columns they must be 0.

        If there is no value >0 found in the first pass, you must find a column consisting of ? only in order to return true.

        This 2 pass solution could be more efficient, but this depends on the distribution of cases, which only you know well.

        Alternatively you could do a one pass check with just one array @seen initialized to undef to represent the columns.

        Once an entry is 0 it must stay 0, once it's 1 it must stay 1 (1 for > 0) otherwise you bail out with fail.

        You'll also need one scalar $main_col set to the index of the first >0 column. If there is ever another col-index with >0 you need to return a fail. (That's kind of a counter if you want)

        If this loop ends with $main_col set, continue with that column.

        If the loop ends with $main_col == undef', you need to a parse '@seen for at least one undef column. If it doesn't exist it's a fail otherwise a pass (a column which only consists of ? will always pass)

        This one pass solution has several advantages

        • bailing out as soon as possible
        • no expensive and complicated hashes
        • can be easily translated to C

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      ahem, all your points were exactly what occurred to me on day 2 or so when I took closer look at hv's code/algorithm, though I understood nothing of maths involved. Indeed, looking at list "through modulo p" is looking at 2-d table with p columns. Then subsequent column extraction (and imagining a subset as table again, etc.) reduces amount of data, to inspect, exponentially. I don't know if it can be simpler or faster. I wrote some Perl, stared at it a little, and realized it's translatable to C almost line by line, my C skill would suffice. If data is not Perl array but packed bytes (e.g. 255 as undef), code is easily adjusted.

      Lines marked //*/ were added only today to fix latest glitch mentioned. As I said, the whole thing was just code refactoring exercise on my part without understanding what's going on. And it certainly can be further improved. (edit: forgot END_OF_C when pasting; + better names for benchmark tests. edit 2: Aargh, sorry, it was before coffee in the morning, when I added today's //*/ lines and also "optimized" i.e. screwed bit tests in z_mask. Fixed.)

      my @data = ( [ 2, [0, 1, 0, 2, 0]], [ 2, [1, 0, 5, 0, 1]], [ 2, [2, undef, undef, undef, 2]], [ 3, [1, 0, 0, 1, 0, 0, 1]], [ 2, [1, 0, 8, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0]], [ 3, [0, 1, undef, undef, 0, undef]], ); use Inline C => <<'END_OF_C'; int valid_set_c( int p, SV *a_ref ) { AV* array = (AV *)SvRV( a_ref ); int last = av_top_index( array ); int from = 0; int incr = 1; int initial_mask = ( 1 << p ) - 1; for ( int j = 0;; j ++ ) { int z_mask = initial_mask; int nz_cnt = 0; int nz_col; for( int i = from; i <= last; i += incr ) { SV* item = *av_fetch( array, i, 0 ); if ( SvOK( item )) { int col = i / incr % p; if ( SvIV( item ) == j ) { if ( nz_cnt && nz_col == col ) return 0; //*/ if ( z_mask & ( 1 << col )) { z_mask &= ~( 1 << col ); if ( !z_mask ) return 0; } } else { if ( nz_cnt ++ ) { if ( nz_col != col ) return 0; } else nz_col = col; if ( !( z_mask & ( 1 << nz_col ))) return 0; + //*/ } } } if ( nz_cnt <= 1 ) return 1; from += nz_col * incr; incr *= p; } } END_OF_C __END__ Rate valid_set valid_set_c valid_set 35669/s -- -89% valid_set_c 339083/s 851% --

        Nice stuff. :) It's a shame it will silently go wrong when p > 8 * sizeof(int), though. If you're lucky, the compiler may support declaring a variable-sized int[] array to save the bother of mallocing, but just indexing into it will already make the code a chunk more complex.

        And @LanX: there's no XS code here. The existence of Inline::C is the reason I never had to learn XS, and I'm very grateful for it. :)

        unfortunately I'm bad in reading C especially with interspersed XS. :(

        But probably performance would benefit from copying the Perl array to a C array first?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: checking a set of numbers for consistency mod p
by LanX (Saint) on Apr 09, 2022 at 19:55 UTC
    I'm too busy right now for code or a proper proof and to be honest I don't grasp your algorithm ...

    But my intuition says (examples for [ 2, [1, 0, 5, 0, 1]] )

    • it's symmetric around the biggest entry here 5
    • it's sufficent to investigate the neighbours of the first possible solution, here 2**5=32 . All other solutions 2**5*r will have the same pattern around them.
    • to speed it up you can exploit the repeated sieves (like in the sieve algo for prime search) (see footnote °)
    • the undefined holes make this more complicated if they disguise the biggest entry. But you can most probably just stick with the biggest known entry approach as long as you take the first one.

    HTH!

    PS: Hey, number theory is off-topic!!! ;-)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) regarding sieves, every partial sequence is the combination of all former repeated p-1 times, just the last entry is incremented

    for p=2

    <0> <1> <0 2> <0 1 0 3> <0 1 0 2 0 1 0 4> <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5>

    for p=3

    <0> <0 1> <0 0 1 0 0 2> <0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 3>

    etc

    update

    You can easily precalculate the sieves if you have many sequences of a max length like m=32. You'll just calculate the sieve up to m.

    Any bigger max entry will just take the place of the precaclclated max and the sieve will be correct.

      Thanks LanX, but it's not clear to me what algorithm you're proposing here. Taking my last example [1,0,0,1,0,0,1], 3, for example: you find that the biggest known entry is 1 (which presumably requires a pass through the values), now what?

      Hey, number theory is off-topic!!!

      Eh? It's just an algorithm question, right?

        > Eh? It's just an algorithm question, right?

        I was joking.

        > Taking my last example [1,0,0,1,0,0,1], 3, for example: you find that the biggest known entry is 1

        take the pre-calculated sieve for 27 entries from my post and anchor the first 1 (marked ^) from both sequences for a one-on-one comparison

        [1,0,0,1,0,0,1] <0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 3> ^ *

        and you'll see they diverge for the last entry (marked *) hence it's a fail.

        I hope it's clearer now. :)

        If you have billions of comparisons and max m=32 length sequences for a limited number of possible p, this should be very fast.

        (disclaimer: I'm not 100% sure I covered all edge cases caused by the holes)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        update

        for the other examples given

        [0,1,0,2,0] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5> ^ -> PASS

        [1,0,5,0,1] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5 0 1 . +.. > ^ -> PASS

        [2,?,?,?,2] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5> ^ * -> FAIL

        update

        and if you have a sequence [2,[1,0,9,0,1]] with a max (here 9) which isn't in your pre-calculated sieve, identify it with the max from your sieve (here 5)

        [1,0,9,0,1] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5 0 1 . +.. > ^ -> PASS

        of course this would also work by identifying 9 with 4 or 3 or 2 because this sequence is so short. But with a possible m=32 you need that full length.

        maybe easiest if you provided more test data, like a generator producing 1e5 true and 1e5 false examples.

        I'd implement it then.

        This could be used for benchmarking, too.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery