in reply to regular expression searching in binary files
Looks like the strings you are trying to match are utf-16, but burried in a binary file. I'd recommend you use binmode on the file handle you are using to read the data and then you can:
use warnings; use strict; use Encode; my $binstr = "\x{00}\x{01}\x{02}\x{03}\x{04}\x{05}" . "\x{00}A\x{00}u\x{00}t\x{00}h\x{00}o\x{00}r\x{00}" . "\x{80}\x{90}\x{a0}\x{b0}\x{c0}\x{d0}\x{e0}"; my $matchStr = encode ('utf16be', 'Author'); if ($binstr =~ /(\Q$matchStr\E)/) { my $match = decode ('utf16be', $1); print "Found $match\n"; }
Prints:
Found Author
Note that this assumes big endien which seems to match your example, but could be little endien which is native for Windows systems and normal for the net.
|
|---|