PetaMem has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'd like to encode an arbitrary string (may be a perl source in UTF-8 encoding) to a well defined set of characters.

So - something similar to uuencode (pack "u"), but with a defined character set (e.g. only [A-Za-z0-9]) for the encoding string.

Of course, back and forth encoding/decoding should be possible to prevent creation of a write-only device. ;-)

Basically I have only a reduced character set in my "storage" available - about 6bit per character (64 characters) Currently, my best stab at it would be to uuencode the arbitrary string, and do some tr/// on the uuencoded string. Any better ideas?

Update:

I've actually implemented this now for encoding:
my $uuenc = pack "u", $string; $uuenc =~ tr/:;"$%&\/()'*#[]<>@`=,.+-/abcdefghijklmnopqrstuvw/;
and decoding:
my $uudec = $uuenc; $uudec =~ tr/abcdefghijklmnopqrstuvw/:;"$%&\/()'*#[]<>@`=,.+-/; $uudec = unpack "u", $uudec;
Works for me. My primary objective was to *avoid* specific characters that are used otherwise ( ) < > | , etc.

Bye
 PetaMem
    All Perl:   MT, NLP, NLU

Replies are listed 'Best First'.
Re: Map octets to set of characters
by nobull (Friar) on Jan 30, 2005 at 18:20 UTC
    Your approach using pack('u') and tr/// is likely to be difficult to beat for speed or brevity.

    However if you want to want to conform why not just do what everyone else does, use MIME::Base64.

Re: Map octets to set of characters
by ambrus (Abbot) on Jan 30, 2005 at 19:47 UTC

    Encode to utf7. The utf7 encoding uses only nice printable ascii chars. Of course, it has a very bad compression rate.

    Update: here's an example:

    perl -we 'use Encode; print encode("utf7", "Kr\x{151}zus is cs\x{f3}r\ +x{f3} volt hozz\x{e1} k\x{e9}pest.\n");'
    outputs:
    Kr+AVE-zus is cs+APM-r+APM- volt hozz+AOE- k+AOk-pest.

    Update: this is for unicode strings of course, it's not ideal for binary data.

    Update: For binary data, a very good way is to use MIME::Base64, which is in the core with newer perl versions.