I have a large report that I need to extract data from. The report can be broken down into records, with the start of each one looking similar to what is below:
REPORT HEADER ISCDAYRECAP-001 ISC001 ISC RECAP REPORT FOR STORE: 001 + PAGE: 00 1 XTNDED MRKDWN F +OR STORE: 12.00 R U N DEPT: GROCERY POST DATE: 07 +/14/2011 DATE/TIME: 07/14/2011 21:11:05 EXTEND + MRKDWN REASON EXT. MRKDWN
I am using the following code to split the file out into the separate stores, but it splits out into two elements, with element 0 of the array being empty, and everything else within element 1:
#!/usr/bin/perl -w use strict; open my $IN,"<","QISC001" or die "Can not open QISC001: $!\n"; my @records; my $data = do{ local $/; <$IN>; }; @records = split m|(?<=\n)(?=REPORT HEADER ISCDAYRECAP-\d{3})|, $data; close $IN;
One thing that I noticed is that when viewing the input file in vi (I am doing this on a HP-UX system), there is a ^L character at the start of each store, with the exception of the first one, so my guess is that the first part of the regex is incorrect. As always, suggestions/hints are welcome.
In reply to Problem with a regex? by TStanley
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |