Through out the message, there are 10 markers - I am extracting and only interested in 9 of them.

Well, if the markers are encoded as digits, the code I posted elsewhere will still work.

It would reduce memory a lot, (and make the random generation for testing a bit easier and quicker), if they were encoded as 0..8 rather 1..9.

At any one time, I think I would have around 500K of the nearly 900 million possible.

With method I've demonstrated, that becomes a moot point, as it will continue to work -- with the same memory requirement, right up to the point of having all 900M in play.

The real problem comes with the compounding effect of the edits. If you have 500k 9-digit knowns, then there are 67,108,864,000,000 possible 1-digit substitutions for them. Of course, there are still only 900M possible 9-digit sets, so each of those can be a substitute for ~75,000 actuals. Your problem will be to determine which of the actuals it is a match for.

The code I posted elsewhere is flawed. I completely forgot to test for the substitutions in that version. I've corrected that and am currently doing a test to check I didn't screw something else up. When its done, and assuming I didn't, I post the new version as a reply to the existing. (This is just a heads up for you not to waste any of your time on that version.)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^5: Finding Nearly Identical Sets by BrowserUk
in thread Finding Nearly Identical Sets by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.