When munging data like this to extract the stuff that you want, just keep breaking your problem up into smaller pieces (as long as you can break the pieces up reliably!). Then when you can't break things up in a perfectly reliable way, you need to alter your data (or your thinking) until you can get it into a format that will allow you to reliably break it up.

For example, you have one big file with data about several hosts. So, use a peachy keen regexp to break the big file into chunks of data pertaining to each host -- (i.e. splitting the data on the dashes / hostnames / dashes parts). Then, each host's chunk of data has information about software packages. So, split that chunk into its pieces -- (i.e. splitting on the 2 blank lines to get each piece of software). Finally, you reach some slightly inconsistent data. Sometimes you have the software name, a blank line, then patch level information. But for one, there is no blank line separator. SO, rather than splitting on the blank line, you should just pull the first non-blank line from each record to get the software package name. All that's left is your patch info.

If you break it down like this, data munging becomes easy! As long as you know enough about regular expressions, which you can find from many of the good Perl books or the links stephen mentions.


Or you can just use tachyon's code above -- and hope that every time you need something like this, someone will be just as helpful. :)

In reply to Re: Tricky File Parsing by joealba
in thread Tricky File Parsing by dnickel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.