Gorby has asked for the wisdom of the Perl Monks concerning the following question:

Dear Wise Monks, Would you have any idea of a fast way to generate 1.5 Million alphanumeric codes of 9 characters each? I tried to use an array to temporarily store the codes before writing the file but my computer was taking all day. Thanks. Gorby

Replies are listed 'Best First'.
Re: Generating alphanumeric codes (with a closure)
by grinder (Bishop) on Sep 29, 2003 at 10:28 UTC

    Rather than generating them all up front (which would be pretty silly if you don't wind up using them all), why don't you generate them one by one as you need them? A closure-as-iterator would work pretty well:

    #! /usr/bin/perl -w use strict; sub make_gen { my $num = shift; sub { sprintf '%09s', $num++ } }; my $g1 = make_gen( 1 ); print $g1->(), "\n", $g1->(), "\n\n"; my $g2 = make_gen( 'zzx' ); print $g2->(), "\n" for 1..4; __END__ output: 000000001 000000002 000000zzx 000000zzy 000000zzz 00000aaaa

    You can play around with what the internal sub returns, in order to generate the appropriate codes for your application.

    update: fixed length of codes: was 10 instead of 9. I could never read a spec :o)

Re: Generating alphanumeric codes
by Abigail-II (Bishop) on Sep 29, 2003 at 07:16 UTC
    foreach my $code ('000000001' .. '001500000') { # Do something with $code. }

    Abigail

Re: Generating alphanumeric codes
by Roger (Parson) on Sep 29, 2003 at 07:04 UTC
    You have to give more details. You need to be more specific about what kind of codes you are after. Are they in random order? How many digits minimum? Can they start in digit? etc. etc. I can think of lots of ways to generate codes, but what sort of codes do you after?
Re: Generating alphanumeric codes
by mattr (Curate) on Sep 29, 2003 at 12:51 UTC
    Not that there aren't a bajillion ways to do this and the other posts are right but this looked fun..

    mkpasswd as default (on rh9 linux) gives you 9 alphanumeric digits. But a simple perl -e test even without storing in a hash to eliminate dupes would still take 40 hours on my (not that old) laptop. So I wont even look at md5sum's output..

    I like Abigail-II's suggestion, since there is no need to check for dupes as it is already a sequence. You could devise a large number of mathematical algorithms which will give you a good sequence without zeroes. Actually pi might be kind of fun here, but you will I think have to check for duplicate codes then.

    By the way, in the following I'm using capital hex digits. But if you used the 10 numbers and 26 letters of the alphabet you probably would have plenty of values to algorithmically detect whether the code is legal or not without actually checking your database of 1.5 million codes, if you create codes with an algorithm like the credit card companies. You can google for "credit card validation" algorithms if you want to. Well here's a link.

    Okay, back to reality. Another way might be to open /dev/random and reading 9 digits at a time. Possibly you could create the file without perl as YAML (to quote a recent thread) and deserialize? That would be neat (read "masochistic").

    Well I tried it with /dev/random on the command line and guess what? My laptop runs out of random numbers pretty quickly! (The listing continues when you move the mouse around..) So I tried /dev/audio though maybe you will prefer to just read a track off an audio CD.

    The xxd command prints 5 hex two-digit numbers, and cut just takes the first 9 digits as you requested. I didn't get many duplicates. This is what I used..

    [mattr@taygeta mattr]$ time perl -e ' $max=100000; $c=0; open(OUT,">codes"); open (IN,"cat /dev/audio | xxd -u -ps -c 5 |cut -b 1-9 |"); while (<IN>) { $l++; if(!exists($h{$_})) { $h{$_}=1; $c++; chomp; print "$c\n" unless $c%100; } if ($c>$max) { print "$l lines $c uniques found\n"; close(IN); foreach (keys %h) {print OUT "$_";} close(OUT); exit;} }' ... 99800 99900 100000 107849 lines 100001 uniques found real 0m18.977s user 0m3.720s sys 0m0.190s
    So at this rate it will take under 5 minutes to get 1.5 million codes. With this or other solutions you can make some character substitutions if you want to make it look nicer.
    File "codes" contains stuff like (actual data): C5FF33FF3 FF2F0082F FFE2FE5AF 6800EF004 99FF35006 20003101A F3FF7DFFC FF14FF46F
    Now that looks very repetitive but my only explanation is that you only need 5 hex digits to count to a million and we are getting 9, so there is plenty of room.

    I'm surprised it has so few dupes actually, I'd be interested if anyone could tell how random this is with a one of those hash visualization packages used to test crypto strength of a hash algorithm, if there is one around.

    Note: I found a quiet office gave 7% dupes but banging on the laptop case brought it down to 4%. So maybe you should sing to your perl code, so that it you know, runs better.

    I have no sig so why does this div show up..
Re: Generating alphanumeric codes
by tachyon (Chancellor) on Sep 29, 2003 at 13:03 UTC

    What for? Why? Usage? Rationale? Random, sequential, unique, un-guessable?????? This takes only a couple of seconds....

    # usage perl $0 > out.txt $a = 'blah'; printf "%09s\n", $a++ for 1..1_500_000;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print