A few questions:
If so, how are they currently stored?
If not, will you just take the numbers from the first batch as 'good'?
Or are you hoping to cross-check the first batch against the second and detect errors that might be in the first set?
If you have a known good set, then it is simply a case of checking if the number found already exists in that known set, and that can be done very quickly using a 106MB bitstring.
Ie. Its location is fixed, or clearly prefixed, or it will be the only large (say more than 7-digits) number in the message.
What I'm getting at here is the insertions or deletions are easily detected, because the numbers are the wrong length. That only really leaves transpositions to detect?
In reply to Re: Finding Nearly Identical Sets
by BrowserUk
in thread Finding Nearly Identical Sets
by Limbic~Region
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |