perlAffen has asked for the wisdom of the Perl Monks concerning the following question:

In windows I am trying to get text out of a file that certainly seems binary in behaviour. If I pull it into vi (vim), I see junk then the nice text followed by junk. If I add to the file (echo...>>), it goes after the last junk, meanwhile the application continues to add nice text lines inside the junk buffers. The file gets fairly large. Anyway, could someone suggest how I might determine what the new lines of nice text twixt the junk are ? Thanks

Replies are listed 'Best First'.
Re: Extracting Text from a Binary File
by Roy Johnson (Monsignor) on Dec 14, 2005 at 15:38 UTC
    Use something like
    /[[:print:]]{3,}/;
    To identify runs of 3 or more printable characters.

    Caution: Contents may have been coded under pressure.
Re: Extracting Text from a Binary File
by Fletch (Bishop) on Dec 14, 2005 at 15:56 UTC
      I like shell strings, so maybe I can use the PPT, however what I really need to do is 'tail' the file, like tail -f, where new lines are printed, but rather than get the new lines at the end of the file as they are added, I need to 'pull' them out bewteen the binary boundaries. Think that is doable ?? Actually I found the binary junk to only be at the bottom. so it goes like this

      text1
      text2
      JUNKJUNKJUNK

      a line is added

      text1
      text2
      text3
      JUNKJUNKJUNK
Re: Extracting Text from a Binary File
by jonix (Friar) on Dec 14, 2005 at 15:45 UTC
    Please give an example of the junk parts. If you find a characteristic in it that distincts it from the nice text lines chances are that you can tackle them with a regular expression.
Re: Extracting Text from a Binary File
by jfroebe (Parson) on Dec 14, 2005 at 17:56 UTC

    The easiest way is a unix-ish tool: strings. Nevermind.. just saw the above reply. :)

    Jason L. Froebe

    Team Sybase member

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1