perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

At the risk of showing my ignorance...:-) I have several files that have "binary glop" in them -- with a buried absolute, ascii, pathname in them. I wish to search for a specific filename (the 'basename' in the pathhname) and have the embedded pathname printed out. The files are win-style .lnk files that lost their "mojo" under cygwin and appear as regular files like "aclocal.lnk". The correct targets are embedded in the .lnk files (in posix format & dos format, but am ignoring the dospath). Example file:
<binary>/etc/alternatives/aclocal<binary>C:\etc\alternatives\aclocal<e +of>
The string is searchable with grep which displays the binary garbage before and after with. My first attempt was to read in the file on stdin, and search for my target and use patterns before and after to display the entire path like:
$pathchrs => '[-\/\.[:alnum:]_]'; $target = 'aclocal' while(<>){ /(\/$pathchrs*$target$path*)/o && print "$1\n"; }
Unfortunately, life was not so simple. Even though the pattern works under "grep":
grep -a '/[-\/\.[:alnum:]_]*aclocal[-\/\.[:alnum:]_]*' /bin/aclocal +.lnk
perl (and pcregrep) don't seem to find it.

FWIW - I made it work by splitting input into a buffer and removing non-printables, but that's hardly an efficient use of perl.

Why didn't the simpler approach work? Thanks for perls of wisdom...:-)

Replies are listed 'Best First'.
Re: how to read & search in a binary file
by GrandFather (Saint) on Jan 30, 2007 at 00:35 UTC

    After adding a few more path characters (':' and '\') the following works for me under Windows:

    use strict; use warnings; open IN, '<', 'notepad.exe.lnk' or die $@; my $pathchrs = ''; while (<IN>) { my @strs = /([-\/\.[:alnum:]_\\:]{10,})/g; print join "\n", @strs; } close IN;

    Prints:

    notepad.exe C:\WINDOWS\notepad.exe

    DWIM is Perl's answer to Gödel
Re: how to read & search in a binary file
by Tanktalus (Canon) on Jan 30, 2007 at 00:18 UTC

    Your "lines" may match more than one value - and you're only seeing at the first. Change your while to:

    while(<>) { print "$1\n" for /(\/$pathchrs*$target$path*)/og }
    Or, maybe get rid of the "o" modifier:
    my $pathre = qr/\/$pathchrs*$target$path*/; while(<>) { print "$1\n" for /($pathre)/g; }
    The problem is that your binary "glop" probably doesn't have any \n's in it for perl to treat as a new line.

    Of course, you're not entirely explicit with what you do get, so I'm just guessing.

      I think the lack of the line feed was messing me up 'conceptually'. I tried several variations on the pattern, and figured out it was my pattern having \A & \z anchoring it -- when my pathname wasn't really anchored to anything other than unprintables on either end. Sigh.

      I didn't include ":\" in my path chars since I wanted to match the *nix style paths, not the dos paths.

      Oh well...

      Thanks for the hints -- they helped me flesh out what worked from what didn't...

Re: how to read & search in a binary file
by bart (Canon) on Jan 30, 2007 at 07:03 UTC
    A non-Perl solution: you seem to have Unix tools at your disposal, have you tried out what you get using strings? The string you're looking for should at least have the format of a Windows filepath.