comment on

I think my approach would reflect your description quite closely, assuming that you have the data as an array of scalars already:

#!perl -w
use strict;
use Carp qw(croak);
use Data::Dumper;

sub get_value {
    my ($values) = @_;
    my $result = shift @$values;
    if (($result & 0b10000000) == 0) {
        # maybe $result < 127 is faster?
        return $result
    } elsif (($result & 0b11000000) == 0b10000000) {
        # maybe $result < 0b11000000 is faster?
        return (($result & 0b00111111) << 8) + shift @$values;
    } elsif (($result & 0b11000000) == 0b11000000) {
        # Assuming that Perl evaluates even "commutative" expressions
        # from left to right
        $result = (($result & 0b00111111) << 16)
                + ((shift @$values) << 8)
                + ((shift @$values));
        return $result;
    } else {
        # This should not happen anyway:
        croak sprintf "Invalid value found in input: %08b", $result;
    };
};

my @values = (0b01010101, # 85
              0b10101010, 0b10101010, # 10922
              0b11001100, 0b11001100, 0b11001100, # 838860
              0b00000000, # 0 as another simplicistic testcase
              );
while (@values) {
    print "$values[0] => ", get_value( \@values ),"\n";
};
[download]

As you have far more experience in benchmarking things, my idea of changing the bit-equality operations to size comparisons remains unimplemented.

If you have the data not as a stream of integers but as a string, I would use regular expressions to extract the information:

#!perl -w
use strict;
use Carp qw(croak);
use Data::Dumper;

sub get_value {
    my ($values) = @_;
    if ($$values =~ s/^([\x00-\x7F])//) {
        return ord $1
    } elsif ($$values =~ s/^([\x80-\xBF])(.)//) {
        return (((ord($1) & 0b00111111)<< 8)
                + ord($2));
    } elsif ($$values =~ s/^([\xC0-\xFF])(.)(.)//) {
        return ( ((ord($1) & 0b00111111) << 16)
                +(ord($2) <<  8)
                +(ord($3))
               );
    } else {
        croak "Invalid wide character in input: %08b", substr $$values
+,0,1;
    };
};

my @values = (0b01010101, # 85
              0b10101010, 0b10101010, # 10922
              0b11001100, 0b11001100, 0b11001100, # 838860
              0b00000000, # 0 as another simplicistic testcase
              );
my $values = join "", map { chr } @values;
while (length $values) {
    print get_value( \$values ),"\n";
};
[download]

Most likely faster is a single invocation of the RE engine with alternation, but that makes decoding what we've got a bit uglier. I left-pad the string with zeroes to 32bit and then unpack it as a number, masking off the relevant bits afterwards:

#!perl -w
use strict;
use Carp qw(croak);
use Data::Dumper;

sub get_value {
    my ($values) = @_;
    if ($$values =~ s/^([\x00-\x7F]|[\x80-\xBF].|[\xC0-\xFF]..)//) {
        my $tmp = substr "\0\0\0$1", -4; # maybe sprintf or pack would
+ be faster...
        my $result = unpack 'N', $tmp;
        if (length $1 == 3) {
            $result &= 0b00111111_11111111_11111111;
        } elsif (length $1 == 2) {
            $result &= 0b00111111_11111111;
        };
        return $result;
    } else {
        croak "Invalid wide character in input: %08b", substr $$values
+,0,1;
    };
};

my @values = (0b01010101, # 85
              0b10101010, 0b10101010, # 10922
              0b11001100, 0b11001100, 0b11001100, # 838860
              0b00000000, # 0 as another simplicistic testcase
              );
my $values = join "", map { chr } @values;
while (length $values) {
    print get_value( \$values ),"\n";
};
[download]

In reply to Re: [NOT] How would you decode this? by Corion
in thread [NOT] How would you decode this? by BrowserUk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.