in reply to Re^3: How best to strip text from a file?
in thread How best to strip text from a file?
Well, I did get a chance to look at it yesterday before I headed home, and realised I didn't give as much example data as I should have - there are usually numerous Orders containing the multiple distributions... so I'm going to hav a play with the logic today, hopefully, to work out how to perform that loop...
The quick look I had at it got me there, to a point - but "lost" the first line of each subsequent order due to the way I had the loops set up... should hopefully be able to get that right today... but your code has certainly put me well and truly on the way to what I was after, and I'm very thankful for that :)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: How best to strip text from a file?
by Kenosis (Priest) on Nov 07, 2012 at 23:31 UTC | |
You're most welcome, bobdabuilda! ...there are usually numerous Orders containing the multiple distributions... Suspected so. What separates these Orders? One option is to set the record separator ($/) to the text that separates Orders, and then do the matching on each Order. | [reply] [d/l] |
by bobdabuilda (Beadle) on Nov 08, 2012 at 01:29 UTC | |
Yes, that's what I've been looking at (trying) doing. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator. The report also spans multiple pages, including a header on each page, which complicates things just that little bit more also... but I'll worry about that later, once I have the logic for the full order sorted. The page header should be automatically filtered out by the regex the way it stands anyway... I think. One thing I *could* do with a suggestion on, is how to handle breaking out of the loop at the end of each Order. About the only way I can think of to know to stop processing distributions, is to look for the start of the next Order record. In order to do that, though, the line containing data I want has to be read in at the "end" of the loop for the previous Order... and then back up at the start of the loop, it reads the next line of the file in, dropping the previous one, which contains (some of) the data I'm after. Probably easier to show you what I mean in pseudocode to give a better idea :
So, from the above, the issue I am having is the two While loops... the second one "eats" the order info of any Orders following the first. I'm sure I could put some post-While processing there to trap the data before it loops to the next line... but that just seems a bit... uncouth, for wont of a better word. Can't help thinking it should be more elegant (not to mention less likely to fail) than that. | [reply] [d/l] |
by Kenosis (Priest) on Nov 08, 2012 at 05:05 UTC | |
Hi, bobdabuilda You've given this much thought, and I think you're pseudocode is on target. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator. The "Order ID:" as record separator makes sense. The page header should be automatically filtered out by the regex the way it stands anyway... I think. You're correct. I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:
Output
Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time. The code is commented, to assist with understanding it. Let me know if you have any questions about this... Enjoy! | [reply] [d/l] [select] |
by bobdabuilda (Beadle) on Nov 09, 2012 at 04:55 UTC | |
by Kenosis (Priest) on Nov 09, 2012 at 06:02 UTC | |
| |