Re: Binary File Manipulation across Byte boundaries
by tachyon (Chancellor) on May 18, 2004 at 03:12 UTC
|
Expanding on zudes theme. Provided you have say 10 times more RAM that the file size then this is a very simple approach that should be quite speedy as well. If memory is an issue then you would need to implment a sliding buffer.
my $data = 'abacad'; # slurp in whole file into scalar
my $pattern = 'a';
my $ds = unpack "B*", $data;
my $ps = unpack "B*", $pattern;
my $proof = '-' x length($ds);
my $x = 0;
while ( 1 ) {
$x = index($ds,$ps,$x);
last if $x == -1;
print "Match at bit offset $x\n";
# modify proof string to show match, could extract surrounding
# bits just as easily using our $x offset and substr() on $ds
substr( $proof, $x, length($ps), $ps );
$x++; # move offset pointer to next bit to avoid endless loop
}
print "
Original: $ds
Matches: $proof\n";
__DATA__
Match at bit offset 0
Match at bit offset 16
Match at bit offset 32
Original: 011000010110001001100001011000110110000101100100
Matches: 01100001--------01100001--------01100001--------
| [reply] [d/l] |
Re: Binary File Manipulation across Byte boundaries
by Somni (Friar) on May 18, 2004 at 01:01 UTC
|
I've had a module in the works for some time that I just haven't quite finished. It's currently called File::ComplexFormat and can be found at shoebox.net.
I will release it some day, probably as Parse::BinGen (binary parser generator), mostly because Parse::Binary has been taken.
The module is technically usable, but there is no documentation, and the only real example is File::Format::Diablo::d2s, which parses the Diablo 2 .d2s character file format.
The approach I took was to read in a chunk of data, unpack it into bits (1's and 0's), then pop off the number of bits required, pack them, then translate as necessary. The solution is probably slow, but it was clearest in terms of design. I expect I'll move to a selectable XS backend whenever I actually get to hacking on it again.
The functions that accomplish that are read(), _read(), and write() in File::ComplexFormat.
| [reply] |
|
|
So, what kind of perl tools do you have for Diablo II? :)
| [reply] |
Re: Binary File Manipulation across Byte boundaries
by zude (Scribe) on May 18, 2004 at 01:56 UTC
|
Might be easier to use something like
$ds = unpack "B*", $data;
$ps = unpack "B*", $pattern;
($x = index $ds,$ps) >= 0 and print "Match at bit offset $x\n";
Update: how can a 3-line snippet have a bug?
+++++++++ In theory, theory describes reality, but in reality it doesn't.
| [reply] [d/l] |
Re: Binary File Manipulation across Byte boundaries
by graff (Chancellor) on May 18, 2004 at 02:21 UTC
|
The problem seems a tad underspecified, making it harder to suggest an approach that's likely to be appropriate. How many bits would make up the target pattern? Would the target pattern always be the same number of bits? Suppose you have a 6-bit target pattern, and it matches the input starting at, say, the 27th bit. At exactly what bit offset does the first output nibble begin? (i.e. do you have to "re-align" the nibble/byte boundary relative to the start of the matched pattern, or will it be okay for the output to remain byte-aligned relative to the start of the input stream?)
Details aside, it does seem as though Bit::Vector will be a good thing to use -- it has everything you need, and plenty more that you will probably never need. But when fiddling with bits, one thing you need most is a detailed spec for what your intentions are. | [reply] |
|
|
To answer graff's question, yes I would have to "re-align" the nibble/byte boundary relative to the start of the matched pattern. The pattern size will vary. One of the patterns I'm interested in is 16 bits long.
| [reply] |
Re: Binary File Manipulation across Byte boundaries
by hv (Prior) on May 18, 2004 at 03:34 UTC
|
If your pattern is relatively complex (ie complex enough that you'd want to use the regexp engine rather than index()) it might be worth constructing the 8 shifted variants of the pattern so that you can combine them into a single regexp to run over the target string.
For simple patterns or short strings, unpacking the bytes is probably faster.
I'm not familiar with Bit::Vector, so I don't know where that would fit into the efficiency mix.
(This is a gut feel, I welcome benchmarks agreeing or disagreeing with that. :)
Hugo
| [reply] |
Re: Binary File Manipulation across Byte boundaries
by sgifford (Prior) on May 18, 2004 at 03:20 UTC
|
A straightforward, if slow, implementation could create an interface to read a single bit of the file at a time, and use a circular buffer to compare the last n bits to the pattern you're looking for. For example, this code:
looks for a pattern of 4 consecutive ones, and prints out the next nybble.
| [reply] [d/l] |
Re: Binary File Manipulation across Byte boundaries
by DrHyde (Prior) on May 18, 2004 at 07:38 UTC
|
| [reply] |