de-interleaving binary data

joe_b has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm looking for a faster way to de-interleave binary data. I have some source files (~200MB in size) of interleaved data that contain two data streams. The data is interleaved one bit at a time (e.g., if stream one is all 1's and stream two is all 0's, the file would read as "1010101010101010....", and I would like the resultant files to read "1111111..." and "000000...").

The code below works, but is slow (so it seems to me - it takes about 40 minutes to de-interleave a 192MB file on a modern laptop). I read in some data, split each bit into an array, the split that array into two sub arrays, then write it back out. I'm a newbie to Perl and programming in general, which may be obvious from my code.

until ($filesize <= $offset)
{
        sysseek (INTERLEAVED_DATA, $offset, 0);
        sysread (INTERLEAVED_DATA, $buffer, 8192);

        @interleaved = split(//, unpack("b*", $buffer));
        $buffer = "";
    
        while (@interleaved){

        push @uninterleaved_1, shift @interleaved;
        push @uninterleaved_2, shift @interleaved;
        }

        $uninterleaved_1 = pack "b*", join "", @uninterleaved_1;
        @uninterleaved_1 = "";

        $uninterleaved_2 = pack "b*", join "", @uninterleaved_2;
        @uninterleaved_2 = "";
     
        syswrite (UNINTERLEAVED_1, $uninterleaved_1);
        syswrite (UNINTERLEAVED_2, $uninterleaved_2);

        $uninterleaved_1 = "";
    $uninterleaved_2 = "";
        
        $offset = $offset + 8192;
}
[download]

I've tried reading in bigger chunks of data, but it doesn't help. It seems it's all the unpacking and array operations that take the most time. Thanks.

Comment on de-interleaving binary data Download Code

Replies are listed 'Best First'.
Re: de-interleaving binary data by Roy Johnson (Monsignor) on Jun 27, 2007 at 21:49 UTC
I would recommend a lookup table, rather than repeatedly doing the work of de-interleaving. De-interleave every possible 2-byte input into two characters once, then read your data as 2-byte shorts and lookup the output characters. use strict; use warnings; my @masks = map 2*$_, 0..15; sub deinterlace { my $data = shift; my ($odd, $even) = (0, 0); $odd \|= ($data & $masks[$_2 ]) >> $_ for (0..7); $even \|= ($data & $masks[$_2+1]) >> ($_+1) for (0..7); [pack('C', $odd), pack('C', $even)]; } # Build lookup table my @lut; $lut[$_] = deinterlace($_) for (0..0xffff); # Example data my @d = unpack('S', 'Some data to de-interlace.'. "\xfa" x 6 . "\0" x + 6); printf "%04x ", $_ for @d; print "\n"; # Building output strings my ($odd, $even) = ('', ''); for (@d) { my @bytepair = @{$lut[$_]}; $odd .= $bytepair[0]; $even .= $bytepair[1]; } printf "Lengths: %d and %d\n", length($odd), length($even); printf "%02x ", $_ for unpack('C', $odd); print "\n---\n"; printf "%02x ", $_ for unpack('C', $even); print "\n"; [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: de-interleaving binary data by BrowserUk (Patriarch) on Jun 28, 2007 at 00:49 UTC
I'd usurp Roy Johnson's lookup tables, but use two strings and substr rather than an @AoA. A quick test on a 124MB file takes around a minute, with no great difference whether using 4096 byte or larger reads. #! perl -slw use strict; our $SIZE \|\|= 4096; my( $oddLU, $evenLU ) = ('') x 2; for my $i ( 0 .. 0xffff ) { my( $oddByte, $evenByte ) = ('') x 2; my $word = pack 'n', $i; vec( ${ $_ &1 ? \$evenByte : \$oddByte }, $_>>1, 1 ) = vec( $word, $_, 1 ) for 0 .. 15; $oddLU .= $oddByte; $evenLU .= $evenByte; } open my $inFH, '<:raw:perlio', $ARGV[ 0 ] or die $!; open my $oddFH, '>:raw:perlio', 'odd.bin' or die $!; open my $evenFH, '>:raw:perlio', 'even.bin' or die $!; local $/ = \$SIZE; print scalar localtime; while( my $chunk = <$inFH> ) { # print sysseek $inFH, 0, 1; local $/; my( $odds, $evens ) = ('') x 2; for my $idx ( unpack 'n*', $chunk ) { $odds .= substr $oddLU, $idx, 1; $evens .= substr $evenLU, $idx, 1; } print $oddFH $odds; print $evenFH $evens; } close $_ for $inFH, $oddFH, $evenFH; print scalar localtime; __END__ C:\test>u:ls -l 1Mx4096.db odd.bin even.bin -rw-rw-rw- 1 user group 124592128 Jul 28 2004 1Mx4096.db -rw-rw-rw- 1 user group 62296076 Jun 28 01:42 even.bin -rw-rw-rw- 1 user group 62296076 Jun 28 01:42 odd.bin C:\test>junk 1Mx4096.db Thu Jun 28 01:36:11 2007 Thu Jun 28 01:37:12 2007 C:\test>junk -SIZE=1048576 1Mx4096.db Thu Jun 28 01:38:42 2007 Thu Jun 28 01:39:44 2007 C:\test>junk -SIZE=10485760 1Mx4096.db Thu Jun 28 01:41:31 2007 Thu Jun 28 01:42:56 2007 [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: de-interleaving binary data by joe_b (Initiate) on Jun 28, 2007 at 14:32 UTC
Wow. Thank you. Can I just send the rest of my project requirements? :) My run time went form ~2200 seconds to 72 seconds. I can't tell you how much I appreciate your help.	[reply]
Re^3: de-interleaving binary data by BrowserUk (Patriarch) on Jun 28, 2007 at 15:36 UTC
Roy Johnson deserves all the credit. It was his insight to use a lookup table. I just resented the idea of using 64k arrays and 128k scalars to store 128k bytes of data. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: de-interleaving binary data by merlyn (Sage) on Jun 27, 2007 at 20:39 UTC
How about something based on: `my $data = "1024 bytes of data from the file"; my $odd = ""; my $even = ""; for my $bitpair (0..(4 * length $data - 1)) { my $twobits = vec($data, $bitpair, 2); vec($odd, $bitpair, 1) = 1 if $twobits & 2; vec($even, $bitpair, 1) = 1 if $twobits & 1; }` [download] I might have odd and even backwards. Experiment to find out. This will turn 1024 bytes into two 512-byte strings. I'm guessing your code is slow because you're carrying around a single bit as an entire scalar... huge amounts of data overhead. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]