in reply to Question: methods to transfer a long hexadicimal into shorter string
I need to setup an ID (not sequential numbers, need randomness) for each of these unique records.
Why not have the DB give you the randomness you need? If you give each record a sequential (autoincrement) record number--which guarentees uniqueness--and then have an auxillary table which maps that recno to a random 'tag' of say 6 characters: A-Z.
That would give you 300 million unique 6-char tags which should be sufficient to be going on with. And with a properly indexed table, the lookup should be very quick with any DB worth its disk space. If you ever need to expand further you can add a-z to double that; or move to seven chars which would cater for 8 billion.
The toughest part of this suggestion is generating the recno/tag mapping. Or rather, shuffling the tags once you've generated them which Perl makes easy. Shuffling a 300e6 element array is a memory intensive process even if you use an in-place shuffle. A possible solution to this is to break the problem into 2 parts.
Shuffling two small sets of 3-char stems & suffixes is trivial, and combining them at output time means you require minimal memory (~16MB), and it is very fast. It takes about a minute to produce 20e6 or roughly an hour if you wanted the full 300e6:
#! perl -slw use strict; use List::Util qw[ shuffle ]; our $N ||= 20e6; my @tagStems = shuffle 'AAA' .. 'ZZZ'; my $serial = 0; while( my $stem = pop @tagStems ) { for my $suffix ( shuffle 'AAA' .. 'ZZZ' ) { printf "%07d\t%s\n", ++$serial, $stem . $suffix; exit if $serial > $N; }; } __END__ C:\test>786833 0000001 HBDKCE 0000002 HBDBWB 0000003 HBDRRX 0000004 HBDUJF 0000005 HBDFRH 0000006 HBDMFO 0000007 HBDAEO 0000008 HBDEYC 0000009 HBDCCZ 0000010 HBDXPK ... 0017572 HBDIWS 0017573 HBDQFL 0017574 HBDHYU 0017575 HBDUUL 0017576 HBDCGM 0017577 ZPSQVM 0017578 ZPSOUH 0017579 ZPSPJV 0017580 ZPSGAS ...
Whilst not truly random--there are some possible sequences that cannot be generated--it is sufficiently random (ie. unguessable) for many purposes.
If you needed to move to 8 billion/7-char tags, using a 4-3 or 5-2 split works equally well.
|
|---|