perl bitology - breaking into bytes

spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: perl bitology - breaking into bytes by Corion (Patriarch) on Oct 20, 2003 at 09:04 UTC
I think you're looking at the problem from the wrong angle by focusing on the "bit" part. If you look at the problem from more distance, you see that you have 50 bits that must be zero, so that gives you 3 bytes (36 bit) that will always be zero. Looking for these three zero bytes is easy by using regular expressions. If you look left and right of these three bytes, you can then determine if these three zero bytes form part of a correct 128-bit frame. Some untested code that should capture a "interesting" frame candidate, supposing that you can read the whole file into memory: `my $old_buffer; my $offset = 0; while (read(FH,65536,\$buffer)) { # Append the last 8 bytes of the buffer if ($old_buffer) { $buffer = substr( $old_buffer, -8 ) . $buffer; $offset -= 8; }; if ($buffer =~ /(......\0\0\0........)/sm) { printf "Found possible candidate at %s : $1\n", $offset + pos; }; $offset += len $buffer; $old_buffer = $buffer; };` [download] If you can't read the whole file into memory, read smaller chunks of the file into memory and then paste the last 16 bytes (== 128 bits) from the end of the last buffer to the front of the next buffer. That way, you'll also catch frames that fall on a buffer boundary. Update: As liz tells me, you can even look for four consecutive zeroes by the following logic: You have 50 zero bits and at most 7 junk bits in a byte. The 50 bits span 6 bytes, so you will have (50-7*2)/8 clean bytes, which is (36 bits)/8, which makes 4 consecutive zero bytes. If your data is MPEG data, I think I remember that the MPEG frames always start at the byte boundary, so then you could even look for the exact sequence with a carefully crafted RE. `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l] [select]
Re: perl bitology - breaking into bytes by BrowserUk (Patriarch) on Oct 20, 2003 at 13:07 UTC
The answer you are seeking is vec. This snippet prints a bit string (as a string rather than a number as your number could be 128-bits which would move into the realms of floating point inaccuracies), consisting of the value of the 49th through 69th bits, of the 6, 543, 210th 128-bit chunk of a 100MB string. `open F, '<', 'e:\100MB' or die $!; binmode F; { local $/ = \(10010241024); $data = <F> }; sub bits_of_chunk{ my( $chunk, $start, $end ) = @_; join'', map{ vec $data, $_, 1 } ($chunk * 16 * 8 + $start) .. ($chunk * 16 * 8 +$end); }; print bits_of_chunk( 6543210, 49, 69 ); 000000000000000000000` [download] The bits are accessed directly from the 100MB string and are return very quickly. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray!	[reply] [d/l]
Re: Re: perl bitology - breaking into bytes by spurperl (Priest) on Oct 21, 2003 at 06:13 UTC
Thanks again for the insightful reply ! I haven't looked onto vec() before for this problem, and no one turned my attention to it before you (it seems a quite forgotten feature of Perl, only acouple of mentions in the full index of the popular Perl books). It does seem like help for my problem, and I'll consider using it. Perhaps I'll eventually create a File::BitStream module, whichever way I'll choose to implement it.	[reply]
Re: perl bitology - breaking into bytes by Skeeve (Parson) on Oct 20, 2003 at 09:01 UTC
You could convert your 128 Bytes to a bit-String using regular expressions to match that. something like # Warning: Untested code ahead! # Enter at your own risk! read($file, $buf, 128); $bits= unpack("B",$buf); # or "b" depending on bit order do { if ($r= read($file, $byte, 1)) { $newbits=unpack("B",$byte); # or "b" depending on bit order $bits.=$newbits; } print "match!\n" if $bits=~ /^.{0,7}.{9}0{6}.{35}1{51}.{27}/; # ^ ^ ^ ^ ^ ^ # possible offset. 0-7 Bit ---/ \| \| \| \| \| # match exactly 9 bits -------------/ \| \| \| \| # match Bits 9-14 as 0 -----------------/ \| \| \| # match 35 Bits to advance to bit 50 -------/ \| \| # Bits 50-100 (51 Bits) must match 1 ------------/ \| # to match the last 27 bits of your frame-------------/ $bits= substr($bits,8); } while ($r); [download] Not very efficient though.	[reply] [d/l]
Re: Re: perl bitology - breaking into bytes by spurperl (Priest) on Oct 20, 2003 at 11:44 UTC
The problem is deeper... My examples were just figurative, as a matter of fact, a frame is valid when certain single bits are 1, with no relations between them. There's a quite straightforward solution to the problem: read the file, unpack() it into a nice string of 1s and 0s, and work on it. Perl likes nice strings. However, this results in a peformance and memory problem. The file may get huge (100 MB), hence the string of 1s and 0s is 800 MB (each bit is represented by a char), which easily throws my PC out of memory. However, I don't see another way... Digging in bits directly, w/o unpacking them to a string is extremely difficult. I might just go for the string solutions, but will solve the memory problem by not having the whole string in memory, but only some buffer. Messy, but it should work, I hope.	[reply]
Re: perl bitology - breaking into bytes by DrHyde (Prior) on Oct 20, 2003 at 10:55 UTC
File::Binary will help. It was written as part of the project which needed to deal with a file format which used weirdly sized bit fields.	[reply]
Re: perl bitology - breaking into bytes by pg (Canon) on Oct 20, 2003 at 20:56 UTC
Simple, this is about math, not Perl. There is no need to parse the bits, in order to test and see whether certain bits are ones or zeroes: To test whether certain bits are 0’s, just make up a mask with 0’s for those bits you concern, and 1’s for the rest. Bit or your mask with whatever under testing, if the value you get back is the same as the mask, you are okay. To test whether certain bits are 1’s, just make up a mask with 1’s for those bits you concern, and 0’s for the rest. Bit and your mask with whatever under testing, if the value you get back is the same as the mask, you are okay.	[reply]
Re: perl bitology - breaking into bytes by thor (Priest) on Oct 20, 2003 at 19:08 UTC
Here's a little something I came up with: `open(FILE, shift) or die $!; my (@buffer, @bits); while(!eof(FILE)) { if (@buffer == 0) { #if it's empty, fill it my $bytes; #read 30 bytes at a time read(FILE, $bytes, 30); my @ary = map {ord} split("", $bytes); foreach my $byte (@ary) { push @buffer, split("", unpack("B*", $byte)); } } while(@bits < 128) { push (@bits, shift(@buffer)); last if (@buffer == 0); } #the first couple of reads will not fill the bits array next if (@bits < 128); #your processing on @bits goes here shift @bits; }` [download] Perhaps later, I'll put up a version that does all of the work in binary strings rather than arrays. thor	[reply] [d/l]
Re: perl bitology - breaking into bytes by Tardis (Pilgrim) on Oct 21, 2003 at 04:41 UTC
I use a fairly simple method, for some files which I use as indices. It seems fairly efficient. I open the bit file, read in chunks 64K at a time, and use some simple operations to determine 'on' bits. I only care about bytes that have at least one 'on' bit, so this makes things a little easier. Here is some code: `while (sysread(INDEX, $index_data, 65536)) { # so zip round it until we hit a byte with any bit # set to one for ($z = 0 ; $z < length ($index_data) ; $z++ ) { next unless ord(substr($index_data,$z,1)); # so this byte contains something foreach (0..7) { if ( (2 ** $_) & ord(substr($index_data,$z,1)) ) { # this bit is ON # do stuff here } } } }` [download] BTW, I had a bad bug with this code initially. Instead of: `next unless ord(substr($index_data,$z,1));` [download] I had: `next unless substr($index_data,$z,1);` [download] I leave it to the reader to discover the evil bug in this code :-)	[reply] [d/l] [select]