| Category: | Text Processing |
| Author/Contact Info | monoxide0184@gmail.com |
| Description: | These two functions will serialise (bEncode) or unserialise (bDecode) data. It is based on 'Bencoding' which was designed as part of the bittorrent protocol but seems to be a well designed way of doing things. This is a perl implementation of this encoding. Below is a list of how this encoding works, and covers everthing from scalars to hashs, very nicely. I know this is _VERY_ poorly commented, but if someone has some spare time and wants to comment these 100 odd lines, it would be appreciated :). * Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'. * Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0. * Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs']. * Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow' => 'moo', 'spam' => 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam' => ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics). |
#!/usr/bin/perl -w
#
#####
# Provides fuctions for bencoding/decoding
###
# Decodes a bencoded string to a hash/array reference
sub bDecode {
my $dictref = $_[0];
my $retref = undef;
if (substr(${$dictref},0,1) eq "d") {
# print Dumper($dictref);
$retref = {};
${$dictref} = substr(${$dictref},1);
PROCDICTHASH: while (substr(${$dictref},0,1) ne 'e') {
my ($key, $value);
${$dictref} =~ /^(-?[\d]+)/;
$key = substr(${$dictref}, length($1)+1, $1);
${$dictref} = substr(${$dictref}, length($1)+$1+1);
if (${$dictref} =~ /^[ldi\d]/) {
$value = bDecode(\${$dictref});
if (!defined($value)) {
return undef;
}
} else {
return undef;
}
$retref->{$key} = $value;
}
${$dictref} = substr(${$dictref},1);
} elsif (substr(${$dictref},0,1) eq "l") {
# print Dumper($dictref);
$retref = [];
${$dictref} = substr(${$dictref},1);
PROCDICTARR: while (substr(${$dictref},0,1) ne 'e') {
if (${$dictref} =~ /^[ldi\d]/) {
my $value = bDecode(\${$dictref});
if (!defined($value)) {
return undef;
}
push (@$retref, $value);
} else {
return undef;
}
}
${$dictref} = substr(${$dictref},1);
} elsif (${$dictref} =~ /^i(-?[\d]+)e/) {
${$dictref} = substr(${$dictref}, length($1)+2);
return $1;
} elsif (${$dictref} =~ /^([\d]+):/) {
$retref = substr(${$dictref}, length($1)+1, $1);
${$dictref} = substr(${$dictref}, length($1)+$1+1);
return $retref;
} else {
return undef;
}
return $retref;
}
#Encodes a hash/array ref to a bencoded string
sub bEncode {
my $dictref = $_[0];
my $retval = '';
if (ref($dictref) eq 'HASH') {
$retval = 'd';
for my $key (sort keys %$dictref) {
$retval .= length($key).':'.$key;
$retval .= bEncode($dictref->{$key});
}
$retval .= 'e';
} elsif (ref($dictref) eq 'ARRAY') {
$retval = 'l';
for (my $i = 0; $i <= $#{$dictref}; $i++) {
$retval .= bEncode($dictref->[$i]);
}
$retval .= 'e';
} elsif ($dictref =~ /^-?[\d]+$/) {
$retval = 'i'.$dictref.'e';
} else {
$retval = length($dictref).':'.$dictref;
}
return $retval;
};
|
|
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: serialise/unserialising data structures
by castaway (Parson) on May 30, 2005 at 11:06 UTC | |
by monoxide (Beadle) on May 30, 2005 at 11:30 UTC | |
|
Re: serialise/unserialising data structures
by zentara (Cardinal) on May 30, 2005 at 21:48 UTC | |
by monoxide (Beadle) on May 31, 2005 at 06:49 UTC | |
by zentara (Cardinal) on May 31, 2005 at 09:56 UTC |