checking a set of numbers for consistency mod p

hv has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 15, 2022 at 23:03 UTC
I came up with a "new" idea, but just realized it's very similar to your original algorithm Maybe others will profit from the explanation: let's take p=3 and so any input which is a 32-range will be cut out from this circular array (with X >= 4) `0010010020010010020010010030010010020010010020010010030010010020010010 +0200100100X (repeat from start)` [download] plus any entry can be undef="?" If we arrange the input in p columns we see, that no matter were our sub-range starts a) p-1 columns MUST only be 0 (or undef) b) exactly 1 column MUST only have values >0 (or undef) for instance Level=0 `??? 01? 0?0 0?0 010 ??0 010 01? 050` [download] if we take the non-zero column ... `?1??1?115` and decrement all defined entries ... `?0??0?004` we get the input for the next recursion which must hold the same criteria Level=1 `?0? ?0? 004` [download] and so on Level=2 `??3` Level=3 `2` You already (almost) implemented it, and now I understand it too. The way you use those hashes irritated me a lot. If you wanna speed it up you could avoid the recursion, just iterate over the same array starting at the right column with step p^level and calculate `$power -= $level` instead of decrementing a new input array don't use hashes they are slow and the logic is glitchy. Arrays @seenzero and @seenmore just need to be complementary, like (1,0,1) and (0,1,0) in the first level demonstrated. Alternatively use bitstrings and xor them to equal "111". Talking about glitchy, consider this input `01? ?0?` [download] obviously is the middle column wrong having both 0 and 1, but AFAIS will your implementation accept that and recurs to the next level. (Or am I misreading?) Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column°. In a second pass make sure that - for defined parts - this column has only non-zeros and all other columns are always zero. Like that you don't need any hashes or other flags and you can bail out immediately on fail. HTH! :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"	[reply] [d/l] [select]
Re^2: checking a set of numbers for consistency mod p by hv (Prior) on Apr 16, 2022 at 01:39 UTC
I came up with a "new" idea, but just realized it's very similar to your original algorithm I think it's the same as my original algorithm. :) don't use hashes they are slow and the logic is glitchy I guess the main reason to use a hash was that the number of entries gets counted for me: I suspect iterating over an array to count the entries could easily cost all the time saved in assignment. Not sure what you mean by "the logic is glitchy", unless you're referring to the bug you describe later (which I contend is a simple logic error, and not related to the choice of container). I agree bitstrings could be faster for seenzero (though a lot more fiddly, especially if I want a fast bit count) - I'm sure that's what I'd have chosen if writing this in C or assembler. For %seenmore it would probably make sense just to record $mod and $seen (the count) directly. Similarly I agree that the recursion could be avoided. These are all micro-optimizations though, and the question was about a different algorithm - which I think you've given me, piece by piece, in your other posts, I just need to get around to putting together an implementation. Talking about glitchy, consider [the] input `3, [0, 1, undef, undef, 0, undef]` Ooh yes, pretty sure that's a bug in the implementation of my OP: it should return false if any modular value shows up in both %seenzero and %seenmore. I've updated the OP with a simple but inefficient fix for that, thanks! Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column. [...] in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?" My original algorithm copes well with that pathological case, by counting the number of distinct modular values seen for the zeroes. I don't see how you'd find the correct answer for, say, `3, [0, 0, 0]` without counting, using anything like this algorithm.	[reply] [d/l] [select]
Re^3: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 16, 2022 at 19:46 UTC
> I think it's the same as my original algorithm. :) Well your's is buggy! ;-) This count on hashes looks weird and is obviously error prone. If I were you I would have started with an test suite with automated cases. like `[3, [0,0,1,0,0,1,0,0,X]]` with X from `2..3` multiplied with all `2^9` combinations of `?` multiplied with 9 rotations. These are just 25129 <= 9216 positive cases (some will duplicate) IMHO all other arrays of length 9 and max entry 3, which can't be found in this set must be negative. You could use a random function to generate them. > I don't see how you'd find the correct answer for, say, `3, [0, 0, 0]` without counting, using anything like this algorithm. I said first pass search for a power >0 to identify the main column. In the second pass do the validation. All defined entries in the main column must be >0 but in all other columns they must be 0. If there is no value >0 found in the first pass, you must find a column consisting of `?` only in order to return true. This 2 pass solution could be more efficient, but this depends on the distribution of cases, which only you know well. Alternatively you could do a one pass check with just one array @seen initialized to undef to represent the columns. Once an entry is 0 it must stay 0, once it's 1 it must stay 1 (1 for > 0) otherwise you bail out with fail. You'll also need one scalar `$main_col` set to the index of the first >0 column. If there is ever another col-index with >0 you need to return a fail. (That's kind of a counter if you want) If this loop ends with $main_col set, continue with that column. If the loop ends with `$main_col == undef', you need to a parse '@seen` for at least one `undef` column. If it doesn't exist it's a fail otherwise a pass (a column which only consists of `?` will always pass) This one pass solution has several advantages bailing out as soon as possible no expensive and complicated hashes can be easily translated to C Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: checking a set of numbers for consistency mod p by vr (Curate) on Apr 16, 2022 at 07:36 UTC
ahem, all your points were exactly what occurred to me on day 2 or so when I took closer look at hv's code/algorithm, though I understood nothing of maths involved. Indeed, looking at list "through modulo p" is looking at 2-d table with p columns. Then subsequent column extraction (and imagining a subset as table again, etc.) reduces amount of data, to inspect, exponentially. I don't know if it can be simpler or faster. I wrote some Perl, stared at it a little, and realized it's translatable to C almost line by line, my C skill would suffice. If data is not Perl array but packed bytes (e.g. 255 as `undef`), code is easily adjusted. Lines marked /// were added only today to fix latest glitch mentioned. As I said, the whole thing was just code refactoring exercise on my part without understanding what's going on. And it certainly can be further improved. (edit:* forgot `END_OF_C` when pasting; + better names for benchmark tests. edit 2: Aargh, sorry, it was before coffee in the morning, when I added today's `///` lines and also "optimized" i.e. screwed bit tests in `z_mask`. Fixed.) my @data = ( [ 2, [0, 1, 0, 2, 0]], [ 2, [1, 0, 5, 0, 1]], [ 2, [2, undef, undef, undef, 2]], [ 3, [1, 0, 0, 1, 0, 0, 1]], [ 2, [1, 0, 8, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0]], [ 3, [0, 1, undef, undef, 0, undef]], ); use Inline C => <<'END_OF_C'; int valid_set_c( int p, SV a_ref ) { AV* array = (AV )SvRV( a_ref ); int last = av_top_index( array ); int from = 0; int incr = 1; int initial_mask = ( 1 << p ) - 1; for ( int j = 0;; j ++ ) { int z_mask = initial_mask; int nz_cnt = 0; int nz_col; for( int i = from; i <= last; i += incr ) { SV item = av_fetch( array, i, 0 ); if ( SvOK( item )) { int col = i / incr % p; if ( SvIV( item ) == j ) { if ( nz_cnt && nz_col == col ) return 0; /// if ( z_mask & ( 1 << col )) { z_mask &= ~( 1 << col ); if ( !z_mask ) return 0; } } else { if ( nz_cnt ++ ) { if ( nz_col != col ) return 0; } else nz_col = col; if ( !( z_mask & ( 1 << nz_col ))) return 0; + /// } } } if ( nz_cnt <= 1 ) return 1; from += nz_col incr; incr *= p; } } END_OF_C __END__ Rate valid_set valid_set_c valid_set 35669/s -- -89% valid_set_c 339083/s 851% -- [download]	[reply] [d/l] [select]
Re^3: checking a set of numbers for consistency mod p by hv (Prior) on Apr 16, 2022 at 21:14 UTC
Nice stuff. :) It's a shame it will silently go wrong when p > 8 * sizeof(int), though. If you're lucky, the compiler may support declaring a variable-sized int[] array to save the bother of mallocing, but just indexing into it will already make the code a chunk more complex. And @LanX: there's no XS code here. The existence of Inline::C is the reason I never had to learn XS, and I'm very grateful for it. :)	[reply]
Re^4: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 16, 2022 at 21:30 UTC
Re^5: checking a set of numbers for consistency mod p by hv (Prior) on Apr 17, 2022 at 03:03 UTC
Some notes below your chosen depth have not been shown here
Re^3: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 16, 2022 at 19:52 UTC
unfortunately I'm bad in reading C especially with interspersed XS. :( But probably performance would benefit from copying the Perl array to a C array first? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 09, 2022 at 19:55 UTC
I'm too busy right now for code or a proper proof and to be honest I don't grasp your algorithm ... But my intuition says (examples for `[ 2, [1, 0, 5, 0, 1]]` ) it's symmetric around the biggest entry here 5 it's sufficent to investigate the neighbours of the first possible solution, here `25=32` . All other solutions `25r` will have the same pattern around them. to speed it up you can exploit the repeated sieves (like in the sieve algo for prime search) (see footnote °) the undefined holes make this more complicated if they disguise the biggest entry. But you can most probably just stick with the biggest known entry approach as long as you take the first one. HTH! PS: Hey, number theory is off-topic!!! ;-)* Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) regarding sieves, every partial sequence is the combination of all former repeated p-1 times, just the last entry is incremented for `p=2` `<0> <1> <0 2> <0 1 0 3> <0 1 0 2 0 1 0 4> <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5>` [download] for `p=3` `<0> <0 1> <0 0 1 0 0 2> <0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 3>` [download] etc update You can easily precalculate the sieves if you have many sequences of a max length like m=32. You'll just calculate the sieve up to m. Any bigger max entry will just take the place of the precaclclated max and the sieve will be correct.	[reply] [d/l] [select]
Re^2: checking a set of numbers for consistency mod p by hv (Prior) on Apr 09, 2022 at 20:38 UTC
Thanks LanX, but it's not clear to me what algorithm you're proposing here. Taking my last example `[1,0,0,1,0,0,1], 3`, for example: you find that the biggest known entry is 1 (which presumably requires a pass through the values), now what? Hey, number theory is off-topic!!! Eh? It's just an algorithm question, right?	[reply] [d/l]
Re^3: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 09, 2022 at 20:46 UTC
> Eh? It's just an algorithm question, right? I was joking. > Taking my last example `[1,0,0,1,0,0,1], 3,` for example: you find that the biggest known entry is 1 take the pre-calculated sieve for 27 entries from my post and anchor the first `1` (marked ^) from both sequences for a one-on-one comparison `[1,0,0,1,0,0,1] <0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 3> ^ ` [download] and you'll see they diverge for the last entry (marked ) hence it's a fail. I hope it's clearer now. :) If you have billions of comparisons and max m=32 length sequences for a limited number of possible p, this should be very fast. (disclaimer: I'm not 100% sure I covered all edge cases caused by the holes) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} update for the other examples given `[0,1,0,2,0] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5> ^ -> PASS` [download] `[1,0,5,0,1] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5 0 1 . +.. > ^ -> PASS` [download] `[2,?,?,?,2] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5> ^ * -> FAIL` [download] update and if you have a sequence `[2,[1,0,9,0,1]]` with a max (here 9) which isn't in your pre-calculated sieve, identify it with the max from your sieve (here 5) `[1,0,9,0,1] <0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 5 0 1 . +.. > ^ -> PASS` [download] of course this would also work by identifying 9 with 4 or 3 or 2 because this sequence is so short. But with a possible m=32 you need that full length.	[reply] [d/l] [select]
Re^3: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 10, 2022 at 11:24 UTC
maybe easiest if you provided more test data, like a generator producing 1e5 true and 1e5 false examples. I'd implement it then. This could be used for benchmarking, too. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: checking a set of numbers for consistency mod p by hv (Prior) on Apr 11, 2022 at 00:34 UTC
Re^5: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 11, 2022 at 00:50 UTC
Some notes below your chosen depth have not been shown here
Re^5: checking a set of numbers for consistency mod p by LanX (Saint) on Apr 11, 2022 at 12:29 UTC
Some notes below your chosen depth have not been shown here

update

update

update