The cool part of this is, that you can mix 8, 6 and 5 bits characters and stir them into one character string, and fetch them out of there as well.
Moreover, you may use fewer bits for a character as well. If you have a string that consist only of a's and b's, you can decode each character using only one bit.
I think this is also a cool way to make up passwords as well. Take a lowercase word of some length, encode it, and there you have string of 0..9A..Za..z. Add a bit here and there for more difficulty. Of course, don't use this for really important passwords...
Well, I hope you enjoy it.
Jeroen
package CGIpack; use strict; =head1 NAME CGIpack - Transforms parameters into a packed list of URL-compatible c +haracters and vice versa. =head1 SYNOPSIS use CGIpack; # Transform a list of numbers and back. @data=(15,13,2**23,0,2**31); @bitsizes = (4,14,24,1,55 ); $encoded=encode({ bitsizes=>[@bitsizes], data=>[@data]}); $decoded=join ', ', decode( {str=>$encoded, bitsizes=>[@bitsizes]} ); #Results in: ## only converted 32 bits instead of 55 #Encoded: V30000W000004000 #Decoded array: 15, 13, 8388608, 0, 2147483648 #Second example, uses the ..._tr interfaces. This allows strings to b +e parsed @data=('b','daDer9', 12,'C^2','low','UP'); @bitsizes=(1,6*6,4,3*8,5*3,5*2); @types=('l','a','','A','l','u'); $encoded=encode_tr({ bitsizes=>[@bitsizes], data=>[@data], types=>[@t +ypes]}); $decoded=join ', ', decode_tr( {str=>$encoded, bitsizes=>[@bitsizes], + types=>[@types] }); #Encoded: F9RGhJuXyIcbpIV #Third example, uses string lengths rather than the number of bits. N +otice # the increase in string length for the same variables. Also note the + 4 in # $lengths[2]. This decodes for bits in the number '12', the other me +mbers # decode for length. @lengths=(1,6,4,3,3,2); $encoded=encode_tr({ data=>[@data], types=>[@types], lengths=>[@leng +ths]}); $decoded=join ', ', decode_tr( {str=>$encoded, types=>[@types], lengt +hs=>[@lengths] }); #Encoded: XJo6qw4U8laPviq7 =head1 DESCRIPTION CGIpack is a module I wrote in response to a question. Question went +like: "I want my users to save a URL containing a CGI with parameters. To a +void problems with email-readers, the list should be as short as possible, +and to avoid users messing around with the values I would like to see them + encoded in a certain way." I came up with this solution. The encoder takes a list of data, toget +her with a list with the desired number of bits to be saved. The bits are + divided into chunks of 6 bits, that are encoded with the alphanumeric characters *and* % and - (64 characters in total). The resulting strin +g is short, contains as little bits as possible and/or desired by the us +er. =over 4 =item encode( $hashref ) Takes a hashref, which should contain a member 'data', referring to a +list, and a member 'bitsizes', referring to a list containing the number of desired bits for every value in the list. =item decode( $hashref ) Takes a hashref, which should contain a member 'str', containing a str +ing of characters produced by encode (preferably, you may wanna roll your +own ;-) and a member 'bitsizes', referring to a list containing the number + of desired bits for every value encoded in the string. A warning is given + when you try to decode more than 32 bits. =item encode_tr( $hashref )/ decode_tr( $hashref ) Idem encode()/ decode(). Hash should contain member 'types' as well. + See below for a description of accepted types. This allows to include tex +t in the input as well, saving the trouble of encoding the characters into numbers. If the 'bitsizes' member is left out, the 'lengths' member i +s used to figure out how much bits to use. Note that the 'bitsizes' mem +ber allows better finetuning of bits. =back =head2 Types =over 4 =item type 'A' A: 8 bits. This excepts all characters. =item type 'a' a: 6 bits. This codes for alphanumeric characters. =item type 'u' u: 5 bits. Use this for UPPERCASE, A..Z. =item type 'l' l: 5 bits. This is for lowercase, a..z. =item type '' '': number. The number of bits depends on the user, just like the encode() interface. The number of bits should be included in the 'bit +sizes' or 'length' member. =back =head1 CAVEAT If you want to encode characters, you will have to convert them manual +ly using unpack/pack 'c'. I just may add a wrapper for that in the futur +e, though. It does work on 32 bit integers, but not yet on 64 bit. You need a 64 + bit CPU for that, perl 5.6 AND 64 bit support should be compiled in. I ha +ve a PPC, 64 bits, however, there is no 5.6 for MacOS. Patience.
=head1 AUTHOR Jeroen Elassaiss-Schaap =head1 LICENSE Perl/ artisitic license =head1 STATUS Alpha =cut use Exporter; use vars qw( @EXPORT @ISA @VERSION %bits_char); @VERSION = 0.03; @ISA = qw( Exporter ); @EXPORT = qw( &encode &decode &encode_tr &decode_tr); %bits_char = ( a => 6, A => 8, '' => undef, u =>5, l => 5); sub encode{ my $hash = shift; my @data = @{$hash->{'data'}}; my @bitsizes = @{$hash->{'bitsizes'}}; my ($str, $bitstr); for my $bits (@bitsizes) { $bitstr .= unpack("b$bits", pack('VV', shift( @data ) )); } $bitstr .= '0' x ( length($bitstr) % 6 ); for my $item (0..( length($bitstr) / 6 - 1 )){ my $val=pack('b6', substr($bitstr, $item*6, 6) ); for ($val) { tr [\100\077\000-\010\011-\043\044-\075] [\045\055\060-\071\101-\132\141-\172]; $str.=$_; } } $str; } sub decode{ my $hash = shift; my $str = $hash->{'str'}; my @bitsizes= @{$hash->{'bitsizes'}}; my $bitstr = _bits( $str ); my $val; my @data; my $pointer = 0; for my $bits (@bitsizes) { my $val; for ($bits) { $val = unpack('c',pack("b$_",substr( $bitstr, $pointer, $_ + ))), last if $bits < 9; $val = unpack('v',pack("b$_",substr( $bitstr, $pointer, $_ + ))), last if $bits < 17; $val = unpack('V',pack("b32",substr( $bitstr, $pointer, $_ + ))), last if $bits < 33; if (! eval('{$val = unpack("Q",pack("b$_",substr( '. '$bitstr, $pointer, $_ )));1;}') ) { warn "only converted 32 bits instead of $_\n"; $val = unpack('V',pack("b$_", substr( $bitstr, $pointer, $_ ))); } } push( @data, $val); $pointer += $bits; } @data; } sub _bits{ my $str = shift; my $bitstr; for (split //, $str){ tr [\045\055\060-\071\101-\132\141-\172] [\100\077\000-\010\011-\043\044-\075]; $bitstr .= unpack("b6",$_); } $bitstr; } sub encode_tr{ my @array = _hash2array( shift); my @tr = map{ my $type = $_->[2]; last unless _check( $type ); _tr( $_, $type ); } @array; @tr = _trlatedjoin( @tr ); encode( { data => $tr[0], bitsizes => $tr[1] } ); } sub _tr{ my $array = shift; my $type = shift; my $bc = $bits_char{ $type } or return ( [$array->[0]], [$array->[ +1]] ); my @bitsizes = _bitsizes( $array, $type ); my @input = split //, $array->[0]; my @data = map{ my $input = shift( @input ); $input = _trchar( $type, $input); unpack( 'c', $input ); } @bitsizes; ( \@data, \@bitsizes ); } sub _trchar{ my $type = shift; my $char = shift; for ( $char ) { $char =~ tr [\060-\071\101-\132\141-\172] [\000-\010\011-\043\044-\075], last if ( $type eq 'a'); $char =~ tr [\141-\172] [\000-\031], last if ( $type eq 'l'); $char =~ tr [\101-\132] [\000-\031], last if ( $type eq 'u'); } $char; } sub _trlatedjoin{ my @tr0; my @tr1; my $ref; while( $ref = shift) { push( @tr0, @$ref ); push( @tr1, @{ shift(@_) } ); } ( \@tr0, \@tr1 ); } sub decode_tr{ my $hash = shift; my @array = _hash2array( $hash ); my @bitsizes = map{ my $type = $_->[2]; last unless _check( $type ); _bitsizes( $_, $type ); } @array; my @items = decode( { str => $hash->{str}, bitsizes => \@bitsizes +} ); map{ my $type = $_->[2]; last unless _check( $type ); _dejoin( $_, $type, \@items ); } @array; } sub _dejoin{ my $array = shift; my $type = shift; my $itemref = shift; my $bc = $type? $bits_char{ $type } : $array->[1]; my $number = int( $array->[1] / $bc ); $number += (( $array->[1] / $bc ) - $number ) > 0; my @untr = splice( @$itemref, 0, $number ); join '', map{ _detr( $type, pack( 'c', $_ )); } @untr; } sub _detr{ my $type = shift; my $char = shift;; for ( $char ) { $char =~ tr [\000-\010\011-\043\044-\075] [\060-\071\101-\132\141-\172], last if( $type eq 'a' ); $char =~ tr [\000-\031] [\141-\172], last if ( $type eq 'l'); $char =~ tr [\000-\031] [\101-\132], last if ( $type eq 'u'); last if( $type eq 'A' ); $char = unpack( 'c', $char); } $char; } sub _hash2array{ my $hash = shift; _calc_bitsizes( $hash ) unless defined($hash->{bitsizes}); map { [ shift( @{ $hash->{data} }), $_, shift( @{ $hash->{types} } +) ]; } @{ $hash->{bitsizes} } ; } sub _calc_bitsizes{ my $hash = shift; @{ $hash->{bitsizes} } = map{ my $bc = $bits_char{ $_ }; my $length = shift( @{ $hash->{lengths} }); defined( $bc )? $length * $bc : $length; }@{ $hash->{types} }; } sub _bitsizes{ my $array = shift; my $type = shift; my $bc = $bits_char{ $type } or return $array->[1]; my @bitsizes; my $n = int( $array->[1] / $bc ); if( $n ){ for ( 1..$n ) { push( @bitsizes, $bc); } } my $rem; push( @bitsizes, $rem ) if ( $rem = $array->[1] % $bc) ; @bitsizes; } sub _check{ my $type = shift; return 1 unless( defined $type ); my $found; for ( 'A','a','','l','u') { $found += $type eq $_; } $found; } 1;
In reply to Encoding strings and numbers into characters using as little bits as possible. by jeroenes
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |