Shorter ID Codes

Micz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Shorter ID Codes
by kvale (Monsignor) on Oct 28, 2004 at 18:05 UTC

So the procedure is to convert base 10 number to base 36, and then assign number and letters. Let's take 0-9 to be just 0-9 and 10-35 to be A-Z. Then the encoding step would be

use Math::BaseCalc;

$calc36 = new Math::BaseCalc(digits=>[0..9,'A'..'Z'];
$calc10  = new Math::BaseCalc(digits=>[0..9]);

$in_base_36 = $calc36->to_base($calc10->from_base('4345317546') );

$in_base_10 = $calc10->to_base( $calc36->from_base('DA5BG') );
[download]

-Mark

[reply]
[d/l]

Re^2: Shorter ID Codes

by diotalevi (Canon) on Oct 28, 2004 at 18:17 UTC

The sparcity could come from asserting that no id numbers lower than 4345317546 will be produced anymore so you can substract that base number and end up with really short things like '1' -> 'ZZZ....'.

[reply]

Re: Shorter ID Codes
by PodMaster (Abbot) on Oct 28, 2004 at 17:54 UTC

perldoc -f pack

warn pack 'H*', 4345317546 ;
die unpack 'H*', CE1uF;
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re^2: Shorter ID Codes

by DrHyde (Prior) on Oct 29, 2004 at 08:17 UTC

Does anyone know of a good tutorial?

[reply]

Re^3: Shorter ID Codes

by PodMaster (Abbot) on Oct 29, 2004 at 08:27 UTC

Confession of a Perl Hacker (guy also didn't know how to use pack/unpack)
Pack/Unpack Tutorial (aka How the System Stores Data) (tutorial in Tutorials )

[reply]

Re: Shorter ID Codes
by tilly (Archbishop) on Oct 28, 2004 at 17:57 UTC

convert numbers to a different (arbitrary) base

Note that you won't save that much. If you use all of the digits, lower case letters and upper case letters, then 4345317546 becomes 4K4vAS. If you just use digits and upper case letters it becomes 1ZV38H6. If you avoid vowels (to avoid various 4-letter words) then you save even less space.

[reply]

Re: Shorter ID Codes
by grinder (Bishop) on Oct 28, 2004 at 18:00 UTC

Is there a pre-written function to do this in Perl?

Yes there is. Someone asked a similar question a week ago and I suggested that they use Math::BaseCalc.

You just have to choose the characters you want to use, for instance 0..9 and A..Z give you 36 digit values, and so you could convert your base-10 ids into base-36. You could go to base-64 but then you'd have to say "uppercase g, lowercase r" which would be a net loss.

The process is reversable, just convert from base 36 back to base 10.

- another intruder with the mooring of the heat of the Perl

[reply]

Re^2: Shorter ID Codes

by Fletch (Bishop) on Oct 28, 2004 at 19:02 UTC

Of course if you use A..Z then you may introduce confusion between 0 => O and I => 1.

Update: Good point below, although you could mitigate that by reading it back using a phonetic alphabet (papa delta bravo tango), but again that'd take longer.

[reply]

Re^3: Shorter ID Codes

by Your Mother (Archbishop) on Oct 28, 2004 at 19:11 UTC

I'm going to echo this and add that purely numbered IDs are much easier to hear. There is tremendous aural confusion around B & D & P & T etc, even among speakers with the same accent. You might find that while 4242 4242 4242 4242 takes longer to read it's a false economy to switch it out with something like PDBT.

[reply]

Re: Shorter ID Codes
by ikegami (Patriarch) on Oct 28, 2004 at 19:08 UTC

btw, here's the savings you'd get:

1 base10 digit = 1 base64 digit = 0% saving
2 base10 digits = 2 base64 digits = 0% saving
3 base10 digits = 2 base64 digits = 25% saving
4 base10 digits = 3 base64 digits = 25% saving
5 base10 digits = 3 base64 digits = 40% saving
6 base10 digits = 4 base64 digits = 33% saving
7 base10 digits = 4 base64 digits = 43% saving
8 base10 digits = 5 base64 digits = 38% saving
9 base10 digits = 5 base64 digits = 44% saving
10 base10 digits = 6 base64 digits = 40% saving
11 base10 digits = 7 base64 digits = 36% saving
12 base10 digits = 7 base64 digits = 42% saving

And since you mentioned MD5... Hashing algorithms (e.g. MD5) are lossful. I suspect you need to be able to retrieve the unencoded digits, so hashing algorithms won't help you here.

[reply]

Re: Shorter ID Codes
by BrowserUk (Patriarch) on Oct 28, 2004 at 18:07 UTC

Converting to base 64 will give you short, readable, reversible values.

use Math::BaseCnv;

print cnv( 4345317546, 10, 64 );
4304Yg

print cnv( '4304Yg', 64, 10 );
4345317546
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

[reply]
[d/l]

Re^2: Shorter ID Codes

by davido (Cardinal) on Oct 28, 2004 at 19:39 UTC

4304Yg isn't vocally readable. Here's how the conversation would go:

4 3 oh... no, zero. Yes, 4 3 zero 4 why gee. ...oh, that's upper case Y, and lower case g.

Dave

[reply]

Re: Shorter ID Codes
by Eimi Metamorphoumai (Deacon) on Oct 28, 2004 at 18:19 UTC

Digest::BubbleBabble

[reply]

Re^2: Shorter ID Codes

by Eimi Metamorphoumai (Deacon) on Oct 28, 2004 at 20:01 UTC

$consonants1

$vowels

$consonants2

#!/usr/bin/perl -l
use strict;
use warnings;
 
my @consonants1 = qw/ b bl br ch d dr f fr g gr h j k kl kr l m n
                      p pl pr r s sk skl skr sl sn sp spl spr st
                      str t tr v vr z /;
my @vowels = qw/ a e i o u /;
my @consonants2 = qw/ b bs d dge ds f g gs ck ll lf m n nd nt p r
                      s st t v x z /;
 
my $nwords = 0;
my @num2word;
my %word2num;
for my $c1 (@consonants1){
    for my $v (@vowels){
        for my $c2 (@consonants2){
            $num2word[$nwords] = "$c1$v$c2";
            $word2num{"$c1$v$c2"} = $nwords;
            $nwords++;
        }
    }
}
 
while(<>){
    chomp;
    if (/^\d+$/){
        my @words;
        #encode
        while ($_ != 0){
            push @words, $num2word[$_ % $nwords];
            $_ = int ($_ / $nwords);
        }
        print join "-", @words;
    } else {
        #decode
        my $num = 0;
        for my $word (reverse split /-/){
            $num *= $nwords;
            $num += $word2num{$word};
        }
        print $num;
    }
}
[download]

[reply]
[d/l]
[select]

Re^2: Shorter ID Codes

by Eimi Metamorphoumai (Deacon) on Oct 29, 2004 at 14:53 UTC

And just one more thought, now that I've had a bit longer to think about it. Instead of generating "words" like that, you might just get a dictionary of, say, the 10000 most frequently used English words and map each one to a 4 digit number. That way you'd be dealing with real words instead of made up words, which might be easier to understand (although you'd probably want to remove any homophones from the list, as well as anything that sounds substantially similar). Then break your number into four-digit substrings (or however is natural, if it's a phone number break it into 3-3-4, for instance) and translate each one. Again, makes the result longer, but easier to remember and communicate.

[reply]