Yes, that's what I've been looking at (trying) doing. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.

The report also spans multiple pages, including a header on each page, which complicates things just that little bit more also... but I'll worry about that later, once I have the logic for the full order sorted. The page header should be automatically filtered out by the regex the way it stands anyway... I think.

One thing I *could* do with a suggestion on, is how to handle breaking out of the loop at the end of each Order. About the only way I can think of to know to stop processing distributions, is to look for the start of the next Order record. In order to do that, though, the line containing data I want has to be read in at the "end" of the loop for the previous Order... and then back up at the start of the loop, it reads the next line of the file in, dropping the previous one, which contains (some of) the data I'm after.

Probably easier to show you what I mean in pseudocode to give a better idea :

while <DATA> { if (start of record) { get order details while (not a new order) { get distribution details into a hash } print order details and distributions to Excel } }

So, from the above, the issue I am having is the two While loops... the second one "eats" the order info of any Orders following the first. I'm sure I could put some post-While processing there to trap the data before it loops to the next line... but that just seems a bit... uncouth, for wont of a better word. Can't help thinking it should be more elegant (not to mention less likely to fail) than that.


In reply to Re^6: How best to strip text from a file? by bobdabuilda
in thread How best to strip text from a file? by bobdabuilda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.