in reply to Shorter ID Codes

Another approach might be not to minimize for number of characters, but for syllables. For instance, "VX" is two syllables, while "vex" is only one. So if you're looking for easy ways to produce readable IDs, you could look at something like Digest::BubbleBabble (which isn't meant to be reversable, but might be a starting point).

Replies are listed 'Best First'.
Re^2: Shorter ID Codes
by Eimi Metamorphoumai (Deacon) on Oct 28, 2004 at 20:01 UTC
    As an example, I've written a bit of code here. The result is almost always longer in actual characters, but considerably shorter in syllables. I've restricted the consonants to try to stay with pronouncable monosyllables. Basically, each word is a combination of a consonant cluster from $consonants1, a vowel from $vowels, and a final cluster from $consonants2. You can add anything you want to those, as long as they stay aurally distinct. Or if you find some of them are hard to pronounce or keep distinct, you could remove any that you have trouble saying. As given, it'll convert "4345317546" to "stut-prip-bluv" and back.
    #!/usr/bin/perl -l use strict; use warnings; my @consonants1 = qw/ b bl br ch d dr f fr g gr h j k kl kr l m n p pl pr r s sk skl skr sl sn sp spl spr st str t tr v vr z /; my @vowels = qw/ a e i o u /; my @consonants2 = qw/ b bs d dge ds f g gs ck ll lf m n nd nt p r s st t v x z /; my $nwords = 0; my @num2word; my %word2num; for my $c1 (@consonants1){ for my $v (@vowels){ for my $c2 (@consonants2){ $num2word[$nwords] = "$c1$v$c2"; $word2num{"$c1$v$c2"} = $nwords; $nwords++; } } } while(<>){ chomp; if (/^\d+$/){ my @words; #encode while ($_ != 0){ push @words, $num2word[$_ % $nwords]; $_ = int ($_ / $nwords); } print join "-", @words; } else { #decode my $num = 0; for my $word (reverse split /-/){ $num *= $nwords; $num += $word2num{$word}; } print $num; } }
Re^2: Shorter ID Codes
by Eimi Metamorphoumai (Deacon) on Oct 29, 2004 at 14:53 UTC
    And just one more thought, now that I've had a bit longer to think about it. Instead of generating "words" like that, you might just get a dictionary of, say, the 10000 most frequently used English words and map each one to a 4 digit number. That way you'd be dealing with real words instead of made up words, which might be easier to understand (although you'd probably want to remove any homophones from the list, as well as anything that sounds substantially similar). Then break your number into four-digit substrings (or however is natural, if it's a phone number break it into 3-3-4, for instance) and translate each one. Again, makes the result longer, but easier to remember and communicate.