ActiveState Perl 5.8.something (I'm not at the machine right now.) on Micros**t Windoze XP (or sometimes 2000)

I'm trying to extract subversion (or RCS, or CVS, or whatever version control system) keywords and values from compiled sources, encoded as 16-bit elements. I can see (with emacs) the characters in their place, but my REs don't really extract what I need. Sometimes, but not always. Looks like octet-alignment or line-alignment (which doesn't exist for binary files, of course) problems.

Sooooo. My statement (if it were dealing with a plain text file) would be

print $1 if (m#(\$(Author|Date|Id|URL|Version): [-\.\$ _a-zA-Z0-9]+\$) +#);

If I use \000A\000u\000t\000h\000o\000r for Author and similarly substitute for each of the literal characters I'm seeking I can find the keywords. But extracting the values as REs has eluded me.

All ASCII characters in the files I'm examining are represented in two octets, the first being 0x00, the second being the normal ASCII character.

I've tried variously, use utf8; use various encodings but my matches don't capture the strings I'm seeking.

Suggestions? and TIA.
(RTFMs would be welcome; perlre, perlreref, perlretut, and searching on unicode in perl docs found no help.)
Donald.


In reply to regular expression searching in binary files by dhlocker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.