Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: checking a set of numbers for consistency mod p

by LanX (Saint)
on Apr 15, 2022 at 23:03 UTC ( [id://11143001]=note: print w/replies, xml ) Need Help??


in reply to checking a set of numbers for consistency mod p

I came up with a "new" idea, but just realized it's very similar to your original algorithm

Maybe others will profit from the explanation:

let's take p=3 and so any input which is a 32-range will be cut out from this circular array (with X >= 4)

0010010020010010020010010030010010020010010020010010030010010020010010 +0200100100X (repeat from start)

plus any entry can be undef="?"

If we arrange the input in p columns we see, that no matter were our sub-range starts

a) p-1 columns MUST only be 0 (or undef)

b) exactly 1 column MUST only have values >0 (or undef)

for instance

Level=0

??? 01? 0?0 0?0 010 ??0 010 01? 050

if we take the non-zero column ...

?1??1?115

and decrement all defined entries ...

?0??0?004

we get the input for the next recursion which must hold the same criteria

Level=1

?0? ?0? 004

and so on

Level=2

??3

Level=3

2

You already (almost) implemented it, and now I understand it too. The way you use those hashes irritated me a lot.

If you wanna speed it up you could

  • avoid the recursion, just iterate over the same array starting at the right column with step plevel and calculate $power -= $level instead of decrementing a new input array
  • don't use hashes they are slow and the logic is glitchy. Arrays @seenzero and @seenmore just need to be complementary, like (1,0,1) and (0,1,0) in the first level demonstrated. Alternatively use bitstrings and xor them to equal "111".
Talking about glitchy, consider this input

01? ?0?

obviously is the middle column wrong having both 0 and 1, but AFAIS will your implementation accept that and recurs to the next level. (Or am I misreading?)

Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column°.

In a second pass make sure that - for defined parts - this column has only non-zeros and all other columns are always zero. Like that you don't need any hashes or other flags and you can bail out immediately on fail.

HTH! :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

°) in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"

Replies are listed 'Best First'.
Re^2: checking a set of numbers for consistency mod p
by hv (Prior) on Apr 16, 2022 at 01:39 UTC

    I came up with a "new" idea, but just realized it's very similar to your original algorithm

    I think it's the same as my original algorithm. :)

    don't use hashes they are slow and the logic is glitchy

    I guess the main reason to use a hash was that the number of entries gets counted for me: I suspect iterating over an array to count the entries could easily cost all the time saved in assignment. Not sure what you mean by "the logic is glitchy", unless you're referring to the bug you describe later (which I contend is a simple logic error, and not related to the choice of container).

    I agree bitstrings could be faster for seenzero (though a lot more fiddly, especially if I want a fast bit count) - I'm sure that's what I'd have chosen if writing this in C or assembler. For %seenmore it would probably make sense just to record $mod and $seen (the count) directly.

    Similarly I agree that the recursion could be avoided. These are all micro-optimizations though, and the question was about a different algorithm - which I think you've given me, piece by piece, in your other posts, I just need to get around to putting together an implementation.

    Talking about glitchy, consider [the] input 3, [0, 1, undef, undef, 0, undef]

    Ooh yes, pretty sure that's a bug in the implementation of my OP: it should return false if any modular value shows up in both %seenzero and %seenmore. I've updated the OP with a simple but inefficient fix for that, thanks!

    Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column. [...] in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"

    My original algorithm copes well with that pathological case, by counting the number of distinct modular values seen for the zeroes. I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

      > I think it's the same as my original algorithm. :)

      Well your's is buggy! ;-)

      This count on hashes looks weird and is obviously error prone.

      If I were you I would have started with an test suite with automated cases. like [3, [0,0,1,0,0,1,0,0,X]] with X from 2..3 multiplied with all 2^9 combinations of ? multiplied with 9 rotations.

      These are just 2*512*9 <= 9216 positive cases (some will duplicate)

      IMHO all other arrays of length 9 and max entry 3, which can't be found in this set must be negative.

      You could use a random function to generate them.

      > I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

      I said first pass search for a power >0 to identify the main column.

      In the second pass do the validation.

      All defined entries in the main column must be >0 but in all other columns they must be 0.

      If there is no value >0 found in the first pass, you must find a column consisting of ? only in order to return true.

      This 2 pass solution could be more efficient, but this depends on the distribution of cases, which only you know well.

      Alternatively you could do a one pass check with just one array @seen initialized to undef to represent the columns.

      Once an entry is 0 it must stay 0, once it's 1 it must stay 1 (1 for > 0) otherwise you bail out with fail.

      You'll also need one scalar $main_col set to the index of the first >0 column. If there is ever another col-index with >0 you need to return a fail. (That's kind of a counter if you want)

      If this loop ends with $main_col set, continue with that column.

      If the loop ends with $main_col == undef', you need to a parse '@seen for at least one undef column. If it doesn't exist it's a fail otherwise a pass (a column which only consists of ? will always pass)

      This one pass solution has several advantages

      • bailing out as soon as possible
      • no expensive and complicated hashes
      • can be easily translated to C

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re^2: checking a set of numbers for consistency mod p
by vr (Curate) on Apr 16, 2022 at 07:36 UTC

    ahem, all your points were exactly what occurred to me on day 2 or so when I took closer look at hv's code/algorithm, though I understood nothing of maths involved. Indeed, looking at list "through modulo p" is looking at 2-d table with p columns. Then subsequent column extraction (and imagining a subset as table again, etc.) reduces amount of data, to inspect, exponentially. I don't know if it can be simpler or faster. I wrote some Perl, stared at it a little, and realized it's translatable to C almost line by line, my C skill would suffice. If data is not Perl array but packed bytes (e.g. 255 as undef), code is easily adjusted.

    Lines marked //*/ were added only today to fix latest glitch mentioned. As I said, the whole thing was just code refactoring exercise on my part without understanding what's going on. And it certainly can be further improved. (edit: forgot END_OF_C when pasting; + better names for benchmark tests. edit 2: Aargh, sorry, it was before coffee in the morning, when I added today's //*/ lines and also "optimized" i.e. screwed bit tests in z_mask. Fixed.)

    my @data = ( [ 2, [0, 1, 0, 2, 0]], [ 2, [1, 0, 5, 0, 1]], [ 2, [2, undef, undef, undef, 2]], [ 3, [1, 0, 0, 1, 0, 0, 1]], [ 2, [1, 0, 8, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0, 1]], [ 3, [undef, 0, 0, undef, 0, 0]], [ 3, [0, 1, undef, undef, 0, undef]], ); use Inline C => <<'END_OF_C'; int valid_set_c( int p, SV *a_ref ) { AV* array = (AV *)SvRV( a_ref ); int last = av_top_index( array ); int from = 0; int incr = 1; int initial_mask = ( 1 << p ) - 1; for ( int j = 0;; j ++ ) { int z_mask = initial_mask; int nz_cnt = 0; int nz_col; for( int i = from; i <= last; i += incr ) { SV* item = *av_fetch( array, i, 0 ); if ( SvOK( item )) { int col = i / incr % p; if ( SvIV( item ) == j ) { if ( nz_cnt && nz_col == col ) return 0; //*/ if ( z_mask & ( 1 << col )) { z_mask &= ~( 1 << col ); if ( !z_mask ) return 0; } } else { if ( nz_cnt ++ ) { if ( nz_col != col ) return 0; } else nz_col = col; if ( !( z_mask & ( 1 << nz_col ))) return 0; + //*/ } } } if ( nz_cnt <= 1 ) return 1; from += nz_col * incr; incr *= p; } } END_OF_C __END__ Rate valid_set valid_set_c valid_set 35669/s -- -89% valid_set_c 339083/s 851% --

      Nice stuff. :) It's a shame it will silently go wrong when p > 8 * sizeof(int), though. If you're lucky, the compiler may support declaring a variable-sized int[] array to save the bother of mallocing, but just indexing into it will already make the code a chunk more complex.

      And @LanX: there's no XS code here. The existence of Inline::C is the reason I never had to learn XS, and I'm very grateful for it. :)

        > And @LanX: there's no XS code here.

        This doesn't look like vanilla C

        > > SV* item = *av_fetch( array, i, 0 );

        Probably differing definitions of "XS" ?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      unfortunately I'm bad in reading C especially with interspersed XS. :(

      But probably performance would benefit from copying the Perl array to a C array first?

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11143001]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-04-24 15:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found