in reply to file parsing help
You might try it this way:
#! perl -slw use strict; $/ = "\n1SYSTEM"; ## para mode my @fields; while( my $page = <DATA> ) { ## Extract date $page =~ m[ACTUALS THRU (\d\d)/(\d\d)] and my $period = "${1}20${2}" or die "Couldn't get date"; ## Extract body and split into lines ## discarding total(*) lines $page =~ m[COST\n(.+)]s and my @lines = grep{!m[^0.\*] } split "\n", $1 or last; for my $line ( @lines ) { my @temp; ## Split the line into fields. Skip the last line eval{ @temp = unpack 'xa5x2a3x2a15x2a6x2a5x2a5x4a5x9a8x4a9x5a5x15a5x4a9', $line; } or last; ## Fill in th missing fields from previous line $temp[ $_ ] =~ m[^\s+$] and $temp[ $_ ] = $fields[ $_ ] for 0, + 1, 2, 3; ## output formatted appropriately print join '|', $period, @temp; ## Save fields for in-filling. @fields = @temp; } } __DATA__
With Your input pasted into the DATA section, this produces:
C:\test>583749 082006|F1150|ABC|KELLY J. |AAF113|FJO1A|FTO5A|284.0| 1.63| 6, +688.22|735.0|4.23 |7,296.52 082006|F1150|ABC|KELLY J. |AAF113|FJO1A|FTO5D| 38.0| .22| +893.91| 90.0| .52 |2,128.73 082006|F1150|ABC|KELLY J. |AAF113|FJO1A|FTW5T| 6.0| .03| +135.07| 6.0| .03 | 135.07 082006|F1150|CDE|DEBORAH M. |AAF103|FJB1A|FTB5A| 3.0| .02| +107.83| 3.0| .02 | 107.83 082006|F1150|CDE|DEBORAH M. |AAF103|FJB1A|FTB5B| | | + | 21.5| .14 | 881.81 082006|F1150|CDE|DEBORAH M. |AAF103|FJB1A|FTB5D| | | + | 5.5| .03 | 194.37 082006|F1150|CDE|DEBORAH M. |AAF103|FJB1A|FTB5G| 5.5| .03| +192.11| 22.0| .11 | 790.06 082006|F1150|CDE|DEBORAH M. |AAF103|FJB1A|FTW5U| | | + | 1.0| .01 | 41.20 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTG5C| | | + | 17.0| .11 | 700.26 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTG5E| 15.5| .09| +557.19| 15.5| .09 | 557.19 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTW5A| 1.0| | + 35.95| 1.0| | 35.95 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTW5G| | | + | 1.5| .01 | 61.79 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTW5H| | | + | 1.0| .01 | 41.20 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTW5T| 1.0| | + 35.95| 3.0| .01 | 118.34 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1N|FTW5U| | | + | 5.0| .03 | 205.96 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1Q|FTG5C| | | + | 2.0| .01 | 70.69 082006|F1150|CDE|DEBORAH M. |AAF103|FJG1V|FTG5E| 64.0| .33| 2, +140.75| 64.0| .33 |2,140.75 082006|F1150|CDE|DEBORAH M. |AAF103|FJG2A|FTG5C| | | + | 2.0| .01 | 70.69 082006|F1150|CDE|DEBORAH M. |AAF103|FJG2A|FTW5E| | | + | 1.0| .01 | 41.20 082006|F1150|CDE|DEBORAH M. |AAF103|FJG2A|FTW5J| | | + | 9.0| .05 | 370.75 082006|F1150|CDE|DEBORAH M. |AAF103|FJG2A|FTW5T| 5.5| .03| +197.72| 5.5| .03 | 197.72 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTO5D|219.0| 1.14| 7, +587.85|432.0|2.34 |5,578.73 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5E| | | + | 1.0| .01 | 41.20 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5G| 1.0| | + 35.95| 1.0| | 35.95 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5T| | | + | 65.5| .37 |2,507.55 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5U| | | + | 3.0| .02 | 106.00 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5V| 34.5| .19| 1, +203.74| 84.5| .49 |3,103.17 082006|F1150|CDE|DEBORAH M. |AAF103|FJO1A|FTW5W| 2.0| .01| + 66.30| 6.0| .04 | 219.51 082006|F1150|HIF|CRAIG |AAF040|FJB1A|FTB5B|145.0| .82| 5, +390.09|536.0|3.05 |9,574.79 082006|F1150|CMV|MARGARET S |AAF070|FJB1A|FTB5B| | | + |138.0| .86 |4,259.44 082006|F1150|CMV|MARGARET S |AAF070|FJG1N|FTG5E| | | + | 7.0| .04 | 191.76 082006|F1150|CMV|MARGARET S |AAF070|FJG1N|FTW5G| | | + | 1.0| | 27.38 082006|F1150|CMV|MARGARET S |AAF070|FJG1N|FTW5V| | | + | 1.0| | 27.38 082006|F1150|CMV|MARGARET S |AAF070|FJG1Q|FTG5E| | | + | 2.0| .01 | 54.78 082006|F1150|CMV|MARGARET S |AAF070|FJG1Q|FTG5F| | | + | 4.0| .02 | 109.56 082006|F1150|CMV|MARGARET S |AAF070|FJG1Q|FTW5B| | | + | 1.0| .01 | 31.48 082006|F1150|CMV|MARGARET S |AAF070|FJG1Q|FTW5G| | | + | 9.0| .05 | 279.29 082006|F1150|CMV|MARGARET S |AAF070|FJG1Q|FTW5V| | | + | 6.0| .03 | 180.76 082006|F1150|PWC|CARL H. |AAF049|FJG1B|FTW5F|120.0| .71| 4, +226.34|324.0|1.86 |0,868.58 082006|F1150|LWR|KIM |AAF104|FJO1A|FTO5C| | | + | 11.0| .06 | 422.18 082006|F1150|LWR|KIM |AAF104|FJO1A|FTO5D| 33.0| .19| 1, +363.92|127.5| .73 |4,887.53 082006|F1150|LWR|KIM |AAF104|FJO1A|FTW5E| 5.0| .03| +254.81| 9.0| .05 | 403.18 082006|F1150|LWR|KIM |AAF104|FJO1A|FTW5G| | | + | 1.0| .01 | 37.08
which isn't formatted exctly as you asked, but you can adjust that to suit your preference/requirements.
(Also, what happened to the KIM lines in your "desired output"?)
The main trick here is to separate the pages into the header and body, so that you can split out the fixed format lines. That allows you to process them using the right tool for the job, unpack.
The other simplification is to treat the fields as an array rather than named entities which makes the substitution process a simple loop.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: file parsing help
by ctaustin (Sexton) on Nov 13, 2006 at 21:30 UTC | |
by BrowserUk (Patriarch) on Nov 13, 2006 at 22:25 UTC | |
by ctaustin (Sexton) on Nov 13, 2006 at 22:32 UTC | |
by planetscape (Chancellor) on Nov 13, 2006 at 22:53 UTC |