eibwen has asked for the wisdom of the Perl Monks concerning the following question:
The other day I found myself on windows and wanted to use `strings`, which I sufficiently replicated in perl readily enough to do what I wanted. On a whim, I decided to finish the port; however I'm having some trouble implementing two of the options from the following excerpts of the `strings` binutils man page:
-a --all - Do not scan only the initialized and loaded sections of object fil +es; scan the whole files. [...] -T bfdname --target=bfdname Specify an object code format other than your system's default for +mat. See Target Selection, for more information. [...] -e encoding --encoding=encoding Select the character encoding of the strings that are to be found. + Useful for finding wide character strings. Possible values for encod +ing are: `s' = single-7-bit-byte characters (ASCII, ISO 8859, etc., default), `S' = single-8-bit-byte characters, `b' = 16-bit bigendian, `l' = 16-bit littleendian, `B' = 32-bit bigendian, `L' = 32-bit littleendian.
My port is presently acting as `strings --all` for all file types, including object files. I am vaguely familiar with general object file structure, ELF in particular, having read several articles/docs recently; however I still don't understand how to differentiate or identify sections (albeit it's likely format dependent), much less distinguish whether a section is loaded and/or initialized.
Question 1: How can I access the content of an initialized and/or loaded section of an object file? How do I identify whether a file is an object file (file magic perhaps, eg File::MimeInfo::Magic)? Are there modules for any of the various formats or will I need to manually open, binmode, and regex? Any resources I should be aware of to understand if/when/why a section is initialized and/or loaded?
The second problem I had (and one I actually know a bit about, hence the emphasis of the title on the first question) was with regard to the encoding. From what I understand of the documentation, encoding of a file can be specified either with the open pragma or appended to the mode argument of a 3-argument open. Given that the encoding is specified as an option and I'd rather not have to move Getopt::Long to a BEGIN block so I can subsequently use the open pragma, I'll go with the mode argument of the 3-arg open, eg:
open(FH, "<:utf8", "file")
Question 2: Can someone confirm that this is the correct (or at least a valid) approach to support encoding of opened files? Does this affect the [:print:] class (`strings` returns only printable characters after all), or is this constant irrespective of encoding? If the latter, how would I adjust the class to support, eg wide char printable characters of a variable encoding? If a file is opened with a specified encoding, do regexes maintain the encoding such that print $1, $/ will print the match in the specified encoding? Lastly, what encodings do the various --encoding options correspond to (I'm used to seeing encodings expressed as names, eg "utf8", rather than by number of bits)?
UPDATE: I recently found binary_analyze from the AppArmor project (Suse wiki, Novell project page), that seems to work with object files from perl at length. I have quite a bit more reading / researching, et all to come to an appreciable understanding of that aspect; however I would appreciate comments with respect to the availability of object file modules or for Question 2 in the interim.
Thanks!
|
|---|