in reply to Re: checking a set of numbers for consistency mod p
in thread checking a set of numbers for consistency mod p

I came up with a "new" idea, but just realized it's very similar to your original algorithm

I think it's the same as my original algorithm. :)

don't use hashes they are slow and the logic is glitchy

I guess the main reason to use a hash was that the number of entries gets counted for me: I suspect iterating over an array to count the entries could easily cost all the time saved in assignment. Not sure what you mean by "the logic is glitchy", unless you're referring to the bug you describe later (which I contend is a simple logic error, and not related to the choice of container).

I agree bitstrings could be faster for seenzero (though a lot more fiddly, especially if I want a fast bit count) - I'm sure that's what I'd have chosen if writing this in C or assembler. For %seenmore it would probably make sense just to record $mod and $seen (the count) directly.

Similarly I agree that the recursion could be avoided. These are all micro-optimizations though, and the question was about a different algorithm - which I think you've given me, piece by piece, in your other posts, I just need to get around to putting together an implementation.

Talking about glitchy, consider [the] input 3, [0, 1, undef, undef, 0, undef]

Ooh yes, pretty sure that's a bug in the implementation of my OP: it should return false if any modular value shows up in both %seenzero and %seenmore. I've updated the OP with a simple but inefficient fix for that, thanks!

Anyway depending on the distribution of your input, it might be easier to search in a first pass for the first entry > 0 to calculate the "right" column. [...] in the pathological case that there is no entry >0 you'll need to find the column consisting of only "?"

My original algorithm copes well with that pathological case, by counting the number of distinct modular values seen for the zeroes. I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

Replies are listed 'Best First'.
Re^3: checking a set of numbers for consistency mod p
by LanX (Saint) on Apr 16, 2022 at 19:46 UTC
    > I think it's the same as my original algorithm. :)

    Well your's is buggy! ;-)

    This count on hashes looks weird and is obviously error prone.

    If I were you I would have started with an test suite with automated cases. like [3, [0,0,1,0,0,1,0,0,X]] with X from 2..3 multiplied with all 2^9 combinations of ? multiplied with 9 rotations.

    These are just 2*512*9 <= 9216 positive cases (some will duplicate)

    IMHO all other arrays of length 9 and max entry 3, which can't be found in this set must be negative.

    You could use a random function to generate them.

    > I don't see how you'd find the correct answer for, say, 3, [0, 0, 0] without counting, using anything like this algorithm.

    I said first pass search for a power >0 to identify the main column.

    In the second pass do the validation.

    All defined entries in the main column must be >0 but in all other columns they must be 0.

    If there is no value >0 found in the first pass, you must find a column consisting of ? only in order to return true.

    This 2 pass solution could be more efficient, but this depends on the distribution of cases, which only you know well.

    Alternatively you could do a one pass check with just one array @seen initialized to undef to represent the columns.

    Once an entry is 0 it must stay 0, once it's 1 it must stay 1 (1 for > 0) otherwise you bail out with fail.

    You'll also need one scalar $main_col set to the index of the first >0 column. If there is ever another col-index with >0 you need to return a fail. (That's kind of a counter if you want)

    If this loop ends with $main_col set, continue with that column.

    If the loop ends with $main_col == undef', you need to a parse '@seen for at least one undef column. If it doesn't exist it's a fail otherwise a pass (a column which only consists of ? will always pass)

    This one pass solution has several advantages

    • bailing out as soon as possible
    • no expensive and complicated hashes
    • can be easily translated to C

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery