dhlocker has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to extract subversion (or RCS, or CVS, or whatever version control system) keywords and values from compiled sources, encoded as 16-bit elements. I can see (with emacs) the characters in their place, but my REs don't really extract what I need. Sometimes, but not always. Looks like octet-alignment or line-alignment (which doesn't exist for binary files, of course) problems.
Sooooo. My statement (if it were dealing with a plain text file) would be
print $1 if (m#(\$(Author|Date|Id|URL|Version): [-\.\$ _a-zA-Z0-9]+\$) +#);
If I use \000A\000u\000t\000h\000o\000r for Author and similarly substitute for each of the literal characters I'm seeking I can find the keywords. But extracting the values as REs has eluded me.
All ASCII characters in the files I'm examining are represented in two octets, the first being 0x00, the second being the normal ASCII character.
I've tried variously, use utf8; use various encodings but my matches don't capture the strings I'm seeking.
Suggestions? and TIA.
(RTFMs would be welcome; perlre, perlreref, perlretut, and searching on unicode in perl docs found no help.)
Donald.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regular expression searching in binary files
by GrandFather (Saint) on Nov 12, 2006 at 05:22 UTC | |
|
Re: regular expression searching in binary files
by bart (Canon) on Nov 12, 2006 at 07:40 UTC | |
by GrandFather (Saint) on Nov 12, 2006 at 08:46 UTC | |
by dhlocker (Novice) on Nov 12, 2006 at 14:20 UTC | |
by dhlocker (Novice) on Nov 13, 2006 at 13:25 UTC | |
|
Re: regular expression searching in binary files
by aufflick (Deacon) on Nov 13, 2006 at 01:18 UTC | |
by dhlocker (Novice) on Nov 13, 2006 at 02:32 UTC |