in reply to Statement Parsing and Rendering

Based on the assumption that each $fieldno will be followed by a fixed number of fields I would propose a data dictionary that contains field names and whether or not there could be multiple entries. Based on this, your code to read the data could be rather short and all intelligence is put into the data dictionary. However, I have already seen that for example 530 can have 9 or 10 fields following it. Anyways, here is some code that explains my thoughts (the data dictionary is incomplete):

use strict; use warnings; use Data::Dumper; my %desc = ( 200 => { type => 'single', names => [ 'f1', 'f2', 'f3', 'name', 'street', 'place1', +'place2', 'f8', 'f9', 'f10', 'f11', 'email', 'f13' ] }, 501 => { type => 'single', names => [ 'rate1', 'rate2' ] }, 530 => { type => 'multiple', names => [ 'f1', 'f2', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9' + ] } ); my %data; $/ = "`\n"; foreach my $line (<DATA>) { chomp $line; my @fields = split '~\d\d', $line; next unless length $line; next unless scalar(@fields); my $fieldno = shift @fields; if( exists $desc{$fieldno} ) { if( $desc{$fieldno}{type} eq 'single' ) { @{$data{$fieldno}}{ @{ $desc{$fieldno}{names} } } = @field +s; } else { push @{ $data{$fieldno} }, {}; @{ $data{$fieldno}[-1] }{ @{$desc{$fieldno}{names}} } = @f +ields; } } else { print "Unknown data $fieldno\n"; } } print Dumper \%data; __DATA__ 200~020000000123~0509112013~0610102013~07JOHN SMITH~08131 MAIN ST~09SO +MEWHERE TX 77777~12SOMEWHERE~13TX~1477777~15R011~197~24bobsmithsemail +@gmail.com~251` 501~0211.150%~03.030547%` 500~0109112013~020001~03PERSONAL LOAN~05Balance Forward~0694188~0770~0 +87~182~21Original Balance~22138000~24608` 300~01BOBS FRIEND~021` 530~0109182013~0209182013~0312184-~04609~051237~0610338-~0783850~08Pay +ments by Check~23K~25P` 510~0109182013~0209182013~03Mail Transaction~041` 539` 530~0110082013~0210082013~0312200-~05512~0611688-~0772162~08Payments b +y Check~23K~25P` 510~0110082013~0210082013~03Mail Transaction~041` 539` 599~0110102013~02Ending Balance~0372162~10Total Aggregate Amount Paid +From Open~12Total Interest Paid From Open~136063~141218~1565838` 570~0112183~0311042013~0411042013~0512162~0712162~0844~1112183` 540~012~0224384` 550~01Interest Paid~026063~031218~06609~071749~0865838~091218` 690~010004~032219~04600` 701~01PERSONAL LOAN~0272162`

Replies are listed 'Best First'.
Re^2: Statement Parsing and Rendering
by PerlSufi (Friar) on Oct 25, 2013 at 13:56 UTC
    Awesome, thanks hdb++ I'll give that a whirl and try to post any other problems I encounter.
Re^2: Statement Parsing and Rendering
by PerlSufi (Friar) on Oct 25, 2013 at 14:34 UTC
    hdb: I could only get that to work my changing it to..
    open(my $in_fh = IO::File->new, "<", $infile) or die "Can't open $infile: $!.\n"; close $in_fh; my $chunk =~ s/[\f\r\n]//g; my @statement = split '`', $chunk; ## %desc stuff here my %data; $/ = "`\n"; foreach my $line (@statement) { .. }
    instead of:
    my %data; $/ = "`\n"; foreach my $line (<DATA>) { .. }
    Is %data set to the file variable in your version? I think shifting the first field may be a mistake of mine, too. Doing this doesn't parse the 200~ fields. my output was:
    Unknown data 020000000123 Unknown data 500 Unknown data 300 Unknown data 510 Unknown data 539 Unknown data 510 Unknown data 539 Unknown data 599 Unknown data 570 Unknown data 540 Unknown data 550 Unknown data 690 Unknown data 701 Unknown data 200~ $VAR1 = { '501' => { 'rate1' => '11.150%', 'rate2' => '.030547%' }, '530' => [ { 'f8' => '83850', 'f6' => '1237', 'f1' => '09182013', 'f9' => 'Payments by Check', 'f5' => '609', 'f2' => '09182013', 'f7' => '10338-', 'f4' => '12184-' }, { 'f8' => 'Payments by Check', 'f6' => '11688-', 'f1' => '10082013', 'f9' => 'K', 'f5' => '512', 'f2' => '10082013', 'f7' => '72162', 'f4' => '12200-' } ] };