Re: Maintaining uniqueness across millions of items
by dragonchild (Archbishop) on Aug 13, 2004 at 12:11 UTC
|
it seems as though ...
Back-of-a-napkin benchmarks are, well, worth about the same as a used napkin. I would suggest grabbing MySQL or PostgreSQL and trying it out. An RDBMS would certainly be the solution I would choose, given what you have described of the problem. (Not to mention that, at best, you're looking at needing 16 x 600M = 6.4B bytes, or 6.4gigabytes, even without any overhead.)
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
Re: Maintaining uniqueness across millions of items
by davorg (Chancellor) on Aug 13, 2004 at 12:19 UTC
|
I'm probably missing something here, but is there any reason why you can't just allocate sequential numbers? That way you'd only need to store the next available value.
--
< http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] |
Re: Maintaining uniqueness across millions of items
by adrianh (Chancellor) on Aug 13, 2004 at 12:25 UTC
|
What are your other constraints? I presume there must be a reason that you're not starting at 1 and counting upwards :-)
Personally I use Data::UUID when I need unique IDs these days.
| [reply] |
Re: Maintaining uniqueness across millions of items
by simon.proctor (Vicar) on Aug 13, 2004 at 12:26 UTC
|
As an alternative to the suggestions above, you could also consider using a DBM file solution. Specifically the Berkeley DB from SleepyCat. It operates in the key/value system you already use and scales well.
There is a CPAN module here so YMMV (look at the tests). | [reply] |
Re: Maintaining uniqueness across millions of items
by water (Deacon) on Aug 13, 2004 at 13:19 UTC
|
If you get away with "probably unique", you can rely on statistics: by randomly selecting with replacement a small number of elements from a very large set, you can determine the likelihood that all are unique
(something like (k-1)!/n^k for k pulls from a set of n).
If you can't get away with "probably", you need to use something sequential (1, 2, 3, ...) or some invertible function of something sequential (f(1), f(2), f(3)...), if you are concerned that these "sequential" numbers don't appear to be sequential. (security-thru-obscurity, beware)
Otherwise, you're stuck with remembering what you've already said: you need a database. An aside -- there are special approaches for storing "archival" data, where one only needs insert and select, not delete or update. Here's one
random link to this idea. | [reply] |
Re: Maintaining uniqueness across millions of items
by Random_Walk (Prior) on Aug 13, 2004 at 12:20 UTC
|
How about splitting them into groups based on first character ?
If you have 26*2+10 possible first chars just generate 600/52 million for each initial letter and write them out before moving onto the next letter. I guess randomness is important here as you are not just generating the sequence 1...600M so when you hand them out first randomly select which block to get one from, them select a random one from that block (and mark it as used).
Cheers. | [reply] |
Re: Maintaining uniqueness across millions of items
by deibyz (Hermit) on Aug 13, 2004 at 12:54 UTC
|
I have to generate a couple hundred million unique codes, that are 16 digits in length, upper-alphanumeric, (insert lots of criteria here).
Depending on how complex the criteria is, I would recomend you to find an algorith that generates all of them whithout repetition. If you post the criteria here I'm sure somebody is going to be able to help you.
deibyz | [reply] |
Re: Maintaining uniqueness across millions of items
by Gilimanjaro (Hermit) on Aug 13, 2004 at 13:41 UTC
|
First of all, you're talking about 600 million pieces of information, which amounts to 'large amounts of data' and databases are intented to maintain 'large amounts of data', so I'd recommend a database backend.
Second of all, how do you generate the id's? If it can be sequential, 6 digits (0-9+A-Z) would give you over three times the number of id's you'd need.
So I guess it depends on what other criteria the id's need to adhere to. Let us know; we're dying to find out... :) | [reply] |
Re: Maintaining uniqueness across millions of items
by ChuckularOne (Prior) on Aug 13, 2004 at 13:03 UTC
|
There is always the possibility that you could generate the numbers and create a checkssm that is significantly smaller than 16 digits.
Then you would only have to compare the checksums (and hold them in memory) for uniquness.
The only caveat being that in a list of 600,000,000 keys you, most likely, will have some codes/numbers which are unique but have the same checksum. | [reply] |