in reply to [NOT] How would you decode this?

I think my approach would reflect your description quite closely, assuming that you have the data as an array of scalars already:

#!perl -w use strict; use Carp qw(croak); use Data::Dumper; sub get_value { my ($values) = @_; my $result = shift @$values; if (($result & 0b10000000) == 0) { # maybe $result < 127 is faster? return $result } elsif (($result & 0b11000000) == 0b10000000) { # maybe $result < 0b11000000 is faster? return (($result & 0b00111111) << 8) + shift @$values; } elsif (($result & 0b11000000) == 0b11000000) { # Assuming that Perl evaluates even "commutative" expressions # from left to right $result = (($result & 0b00111111) << 16) + ((shift @$values) << 8) + ((shift @$values)); return $result; } else { # This should not happen anyway: croak sprintf "Invalid value found in input: %08b", $result; }; }; my @values = (0b01010101, # 85 0b10101010, 0b10101010, # 10922 0b11001100, 0b11001100, 0b11001100, # 838860 0b00000000, # 0 as another simplicistic testcase ); while (@values) { print "$values[0] => ", get_value( \@values ),"\n"; };

As you have far more experience in benchmarking things, my idea of changing the bit-equality operations to size comparisons remains unimplemented.

If you have the data not as a stream of integers but as a string, I would use regular expressions to extract the information:

#!perl -w use strict; use Carp qw(croak); use Data::Dumper; sub get_value { my ($values) = @_; if ($$values =~ s/^([\x00-\x7F])//) { return ord $1 } elsif ($$values =~ s/^([\x80-\xBF])(.)//) { return (((ord($1) & 0b00111111)<< 8) + ord($2)); } elsif ($$values =~ s/^([\xC0-\xFF])(.)(.)//) { return ( ((ord($1) & 0b00111111) << 16) +(ord($2) << 8) +(ord($3)) ); } else { croak "Invalid wide character in input: %08b", substr $$values +,0,1; }; }; my @values = (0b01010101, # 85 0b10101010, 0b10101010, # 10922 0b11001100, 0b11001100, 0b11001100, # 838860 0b00000000, # 0 as another simplicistic testcase ); my $values = join "", map { chr } @values; while (length $values) { print get_value( \$values ),"\n"; };

Most likely faster is a single invocation of the RE engine with alternation, but that makes decoding what we've got a bit uglier. I left-pad the string with zeroes to 32bit and then unpack it as a number, masking off the relevant bits afterwards:

#!perl -w use strict; use Carp qw(croak); use Data::Dumper; sub get_value { my ($values) = @_; if ($$values =~ s/^([\x00-\x7F]|[\x80-\xBF].|[\xC0-\xFF]..)//) { my $tmp = substr "\0\0\0$1", -4; # maybe sprintf or pack would + be faster... my $result = unpack 'N', $tmp; if (length $1 == 3) { $result &= 0b00111111_11111111_11111111; } elsif (length $1 == 2) { $result &= 0b00111111_11111111; }; return $result; } else { croak "Invalid wide character in input: %08b", substr $$values +,0,1; }; }; my @values = (0b01010101, # 85 0b10101010, 0b10101010, # 10922 0b11001100, 0b11001100, 0b11001100, # 838860 0b00000000, # 0 as another simplicistic testcase ); my $values = join "", map { chr } @values; while (length $values) { print get_value( \$values ),"\n"; };