I need to setup an ID (not sequential numbers, need randomness) for each of these unique records.

Why not have the DB give you the randomness you need? If you give each record a sequential (autoincrement) record number--which guarentees uniqueness--and then have an auxillary table which maps that recno to a random 'tag' of say 6 characters: A-Z.

That would give you 300 million unique 6-char tags which should be sufficient to be going on with. And with a properly indexed table, the lookup should be very quick with any DB worth its disk space. If you ever need to expand further you can add a-z to double that; or move to seven chars which would cater for 8 billion.

The toughest part of this suggestion is generating the recno/tag mapping. Or rather, shuffling the tags once you've generated them which Perl makes easy. Shuffling a 300e6 element array is a memory intensive process even if you use an in-place shuffle. A possible solution to this is to break the problem into 2 parts.

Shuffling two small sets of 3-char stems & suffixes is trivial, and combining them at output time means you require minimal memory (~16MB), and it is very fast. It takes about a minute to produce 20e6 or roughly an hour if you wanted the full 300e6:

#! perl -slw use strict; use List::Util qw[ shuffle ]; our $N ||= 20e6; my @tagStems = shuffle 'AAA' .. 'ZZZ'; my $serial = 0; while( my $stem = pop @tagStems ) { for my $suffix ( shuffle 'AAA' .. 'ZZZ' ) { printf "%07d\t%s\n", ++$serial, $stem . $suffix; exit if $serial > $N; }; } __END__ C:\test>786833 0000001 HBDKCE 0000002 HBDBWB 0000003 HBDRRX 0000004 HBDUJF 0000005 HBDFRH 0000006 HBDMFO 0000007 HBDAEO 0000008 HBDEYC 0000009 HBDCCZ 0000010 HBDXPK ... 0017572 HBDIWS 0017573 HBDQFL 0017574 HBDHYU 0017575 HBDUUL 0017576 HBDCGM 0017577 ZPSQVM 0017578 ZPSOUH 0017579 ZPSPJV 0017580 ZPSGAS ...

Whilst not truly random--there are some possible sequences that cannot be generated--it is sufficiently random (ie. unguessable) for many purposes.

If you needed to move to 8 billion/7-char tags, using a 4-3 or 5-2 split works equally well.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP PCW

In reply to Re: Question: methods to transfer a long hexadicimal into shorter string by BrowserUk
in thread Question: methods to transfer a long hexadicimal into shorter string by lihao

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.