Binary File Manipulation across Byte boundaries

tperdue has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Binary File Manipulation across Byte boundaries by tachyon (Chancellor) on May 18, 2004 at 03:12 UTC
Expanding on zudes theme. Provided you have say 10 times more RAM that the file size then this is a very simple approach that should be quite speedy as well. If memory is an issue then you would need to implment a sliding buffer. my $data = 'abacad'; # slurp in whole file into scalar my $pattern = 'a'; my $ds = unpack "B", $data; my $ps = unpack "B", $pattern; my $proof = '-' x length($ds); my $x = 0; while ( 1 ) { $x = index($ds,$ps,$x); last if $x == -1; print "Match at bit offset $x\n"; # modify proof string to show match, could extract surrounding # bits just as easily using our $x offset and substr() on $ds substr( $proof, $x, length($ps), $ps ); $x++; # move offset pointer to next bit to avoid endless loop } print " Original: $ds Matches: $proof\n"; __DATA__ Match at bit offset 0 Match at bit offset 16 Match at bit offset 32 Original: 011000010110001001100001011000110110000101100100 Matches: 01100001--------01100001--------01100001-------- [download] cheers tachyon	[reply] [d/l]
Re: Binary File Manipulation across Byte boundaries by Somni (Friar) on May 18, 2004 at 01:01 UTC
I've had a module in the works for some time that I just haven't quite finished. It's currently called File::ComplexFormat and can be found at shoebox.net. I will release it some day, probably as Parse::BinGen (binary parser generator), mostly because Parse::Binary has been taken. The module is technically usable, but there is no documentation, and the only real example is File::Format::Diablo::d2s, which parses the Diablo 2 .d2s character file format. The approach I took was to read in a chunk of data, unpack it into bits (1's and 0's), then pop off the number of bits required, pack them, then translate as necessary. The solution is probably slow, but it was clearest in terms of design. I expect I'll move to a selectable XS backend whenever I actually get to hacking on it again. The functions that accomplish that are read(), _read(), and write() in File::ComplexFormat.	[reply]
Re: Re: Binary File Manipulation across Byte boundaries by rje (Deacon) on May 18, 2004 at 16:01 UTC
So, what kind of perl tools do you have for Diablo II? :)	[reply]
Re: Binary File Manipulation across Byte boundaries by zude (Scribe) on May 18, 2004 at 01:56 UTC
Might be easier to use something like `$ds = unpack "B", $data; $ps = unpack "B", $pattern; ($x = index $ds,$ps) >= 0 and print "Match at bit offset $x\n";` [download] Update: how can a 3-line snippet have a bug? +++++++++ In theory, theory describes reality, but in reality it doesn't.	[reply] [d/l]
Re: Binary File Manipulation across Byte boundaries by graff (Chancellor) on May 18, 2004 at 02:21 UTC
The problem seems a tad underspecified, making it harder to suggest an approach that's likely to be appropriate. How many bits would make up the target pattern? Would the target pattern always be the same number of bits? Suppose you have a 6-bit target pattern, and it matches the input starting at, say, the 27th bit. At exactly what bit offset does the first output nibble begin? (i.e. do you have to "re-align" the nibble/byte boundary relative to the start of the matched pattern, or will it be okay for the output to remain byte-aligned relative to the start of the input stream?) Details aside, it does seem as though Bit::Vector will be a good thing to use -- it has everything you need, and plenty more that you will probably never need. But when fiddling with bits, one thing you need most is a detailed spec for what your intentions are.	[reply]
Re: Re: Binary File Manipulation across Byte boundaries by tperdue (Sexton) on May 18, 2004 at 21:05 UTC
To answer graff's question, yes I would have to "re-align" the nibble/byte boundary relative to the start of the matched pattern. The pattern size will vary. One of the patterns I'm interested in is 16 bits long.	[reply]
Re: Binary File Manipulation across Byte boundaries by hv (Prior) on May 18, 2004 at 03:34 UTC
If your pattern is relatively complex (ie complex enough that you'd want to use the regexp engine rather than index()) it might be worth constructing the 8 shifted variants of the pattern so that you can combine them into a single regexp to run over the target string. For simple patterns or short strings, unpacking the bytes is probably faster. I'm not familiar with Bit::Vector, so I don't know where that would fit into the efficiency mix. (This is a gut feel, I welcome benchmarks agreeing or disagreeing with that. :) Hugo	[reply]
Re: Binary File Manipulation across Byte boundaries by sgifford (Prior) on May 18, 2004 at 03:20 UTC
A straightforward, if slow, implementation could create an interface to read a single bit of the file at a time, and use a circular buffer to compare the last n bits to the pattern you're looking for. For example, this code: Read more... (1528 Bytes) looks for a pattern of 4 consecutive ones, and prints out the next nybble.	[reply] [d/l]
Re: Binary File Manipulation across Byte boundaries by DrHyde (Prior) on May 18, 2004 at 07:38 UTC
File::Binary may be useful too.	[reply]