fluffy has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a real-world problem encoding and decoding 7- & 8-bit values. The context is a MIDI SysEx file. In MIDI, the high bit of a byte is used to signal end-of-file, so all the data bytes must be 7 bits. When sending 8-bit data, my synth (a Korg Karma, if you care) encodes eight-bit data as follows:
+--------------------------------------------------------- | Internal 7byte data <--convert--> MIDI 8 byte data + | | example: Internal data(bit image) MIDI data(bit image) + | | Aaaaaaaa 0GFEDCBA + | | Bbbbbbbb 0aaaaaaa + | | Cccccccc 0bbbbbbb + | | Dddddddd 0ccccccc + | | Eeeeeeee 0ddddddd + | | Ffffffff 0eeeeeee + | | Gggggggg 0fffffff + | | Hhhhhhhh 0ggggggg + | | Iiiiiiii 0NMLKJIH + | | : 0hhhhhhh + | | : : + | | Vvvvvvvv 000000WV + | | Wwwwwwww 0vvvvvvv + | | 0wwwwwww + | | 11110111 (EOX=F7H) + | +---------------------------------------------------------
Now, I wish to decode the 7-bit data to 8-bit data, and back again. The question is, what's the most efficient approach? I'm sure something using unpack & vec should be in order, but I'm unsure which is the most (time-)efficient approach. I'm gonna be doing this a fair bit, so I'm willing to trade readability for speed. Ideas gratefully received, Thanks.

Replies are listed 'Best First'.
Re: Converting 7 & 8-bit values
by BrowserUk (Patriarch) on Dec 03, 2006 at 16:45 UTC

    Here's a pure perl way to do it (Check it. I may have misunderstood you).

    There several changes that would speed this up, but if you want the absolute fastest way, you'd be better looking at Inline::C for this.

    sub toMidi { my $out = ''; for my $batch ( unpack '(a7)*', $_[ 0 ] ) { $batch .= chr(0) x ( 7 - length( $batch )); my @bytes = unpack 'C7', $batch; my $h = 0; $h = ( $h >> 1 ) | ( 0x80 & $bytes[ $_ ] ) for 0 .. 6; $bytes[ $_ ] &= 0x7f for 0 .. 6; $out .= pack 'C8', $h >> 1, @bytes; } return $out . chr( 0xf7 ); }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Ah yes, you're quite right: Inline C is what I really want. Well remembered, thanks.
Re: Converting 7 & 8-bit values
by GrandFather (Saint) on Dec 03, 2006 at 21:24 UTC

    The following pure Perl code runs at least 2000 times faster than required to just keep up with one 1200 baud midi channel in one direction. Possibly fast enough?

    use strict; use warnings; use Benchmark qw(cmpthese); my @data28 = map {int (rand () * 256)} 1..28; printf "%02x ", $_ for @data28; print "\n"; my @xlate = thereAndBack (@data28); printf "%02x ", $_ for @xlate; print "\n\n"; my @data280 = map {int (rand () * 256)} 1..280; my @data2800 = map {int (rand () * 256)} 1..2800; cmpthese (-1, { data28 => sub {thereAndBack (@data28)}, data280 => sub {thereAndBack (@data280)}, data2800 => sub {thereAndBack (@data2800)}, } ); sub thereAndBack { return fromKarma (toKarma (@_)); } sub toKarma { die "Blocks must be multiples of 8 bytes" if @_ % 7; my @raw = @_; my @karma; while (@raw) { my @block = splice @raw, 0, 7; my $extra = 0; for my $byte (@block) { $extra |= $byte & 0x80; $byte &= 0x7F; $extra >>= 1; } push @karma, ($extra, @block); } return @karma; } sub fromKarma { die "Blocks must be multiples of 8 bytes" if @_ % 8; my @karma = @_; my @raw; while (@karma) { my @block = splice @karma, 0, 8; my $extra = shift @block; for my $byte (reverse @block) { $extra <<= 1; $byte |= $extra & 0x80; } push @raw, @block; } return @raw; }

    Prints:

    37 8b a0 23 ef 46 68 f8 a1 75 71 de aa eb 0e ee 08 54 c7 77 9f 0b ee d +7 d3 f4 34 69 37 8b a0 23 ef 46 68 f8 a1 75 71 de aa eb 0e ee 08 54 c7 77 9f 0b ee d +7 d3 f4 34 69 Rate data2800 data280 data28 data2800 124/s -- -90% -98% data280 1228/s 888% -- -85% data28 8200/s 6496% 568% --

    Note that there is some per packet overhead so there is some advantage in translating larger blocks.


    DWIM is Perl's answer to Gödel
      ... to just keep up with one 1200 baud midi channel ...

      Um. It's been several years since I played with a midi device, but back then the baud rate was 31250 bps. The number sticks in my head because a 500kHz clock with a divide-by-16 circuit produced exactly the required clock rate.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        You may well be right. I was just going on Corion's post Re: Converting 7 & 8-bit values. That makes the code only about 100 times faster than it needs to be just to keep up and if nothing else were happening. Probably ok, but depends what else is happening and what sort of response time is required.


        DWIM is Perl's answer to Gödel
Re: Converting 7 & 8-bit values
by Corion (Patriarch) on Dec 03, 2006 at 15:40 UTC

    If I remember correctly, MIDI is basically a 1200 baud serial interface, so any relatively current computer nowadays should have no problems with supplying the correct data in a timely fashion. Have you looked at the MIDI modules on CPAN already? I'm confident that they provide encoding and decoding of the MIDI event numbers and other stuff.

      Been there, tried them. It's conceivable that what I want is there, but if so, I can't see them. To clarify, what I'm looking for is not general sysex decoding, but a means of decoding a very specific Korg encoding mechanism (if someone else is doing the same, I don't know who; but I'm not familiar with the formats in general). So I don't see any MIDI modules on CPAN that cover this. But thanks!
Re: Converting 7 & 8-bit values
by aufflick (Deacon) on Dec 04, 2006 at 03:42 UTC
    I'm using the same observation as BrowserUK, that the 0 padded nature of the regular rows allows us to simply or them together with the appropriate most significant bit.

    ie. we just need to strip out the A and convert it into A0000000 - then we can or it with 0aaaaaaa to get Aaaaaaaa. I tried to make a perl implementation that makes it clear how that property is being used.

    sub good_karma { my $raw_bytes = shift; my @raw_bytes = unpack 'C*', $raw_bytes; # unsigned char / 8 bits my @temp_bytes; my @resultant_bytes; my $i = 7; for my $byte (@raw_bytes) { $i++; if ($i == 8) { # this is one of those 'fill-in' rows, so dump # out the temp_bytes and re-init them push @resultant_bytes, @temp_bytes[1...7]; # my unpack is a little rusty - this should result # in 8 array elements - each 1 or 0 (with the first always + 0) my @seven_bits = unpack 'bbbbbbbb', $byte; # shift the bit to the left. 8 <<'s might be faster $temp_bytes[$_] = $seven_bits[$_] * 128 for 1...7; # @temp_bytes is now (0, A0000000, B00000000, ...) # restart counter each time $i = 1; } # the 7 bit bytes are conveniently left 0 padded, so we # can just or the two parts together $temp_bytes[$i] |= $byte; } # handle final loop end-case push @resultant_bytes, @temp_bytes[1...$i]; return join( '', @resultant_bytes ); }
    As I said, my pack/unpack usage is a little rusty, so anyone should feel free to point out if I made a mistake.

    The inner loop that is run 7/8 times has only four real operations (++, ==, |, =), so it should be pretty fast, but it wouldn't be hard to make a C version of the sub. It would probably be easier to wrap the C sub inside a perl sub that limited the size of the incoming bytes so you could use statically allocated buffers.

    Update: realised I made a mistake with my pack, which forced a change in the if condition.

    Update2: tested the unpack lines and found them wanting ;) It seems to me that the above should do what the comments suggest, but they don't actually seem to. Unless my testing is getting confused with automatic type conversion...

    Update3: updated to handle the final non-8bit case.

    Update4: I don't know what crack I was on yesterday - of course I meant or not and...

Re: Converting 7 & 8-bit values
by Anonymous Monk on Dec 03, 2006 at 19:33 UTC
    Could you please add a bit more useless whitespace to the right-hand side of your chunk of ASCII? Perhaps enclose the whole thing in <blockquote> tags to make it stand our further? It is only slightly exceeding the width of my browser window on my main desktop and I'd rather it just blow away normal viewing like it does for the browsers in my smaller devices.</sarcasm> :)

    Or perhaps you could remove the ASCII-art border (including the trailing spaces) and allow for people to conveniently read your question even if they don't use a browsing environment nearly identical to your own.