in reply to Re^2: Parsing large text file with perl
in thread Parsing large text file with perl

This looks like a fixed length record. You could use unpack. While the following 'demonstrates' the idea it's probably not the best use of unpack (e.g. there's a floating point number). Your best bet would be to ask another question to find an elegant use of unpack.
#!/bin/perl5 use strict; use warnings; my @history = <DATA>; for my $record (@history){ print "$record\n"; $record =~ s/^\s*//; my @fields = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $record; for my $field (@fields){ print "*$field*\n"; } } __DATA__ AP040003EZ9891783 61125 N BX 108.0 +0000 03196 00000 D Y B BP041303DD554 009J0 N BX 8.7 +5000 03168 62 00000 Y W
When you've cracked it you could probably apply it to the other records as well.

Replies are listed 'Best First'.
Re^4: Parsing large text file with perl
by maida (Initiate) on Sep 03, 2004 at 03:19 UTC
    Thank you all for the help... Hopefully my last question. Here is the code that at the very least seperates the data out.
    #!/usr/bin/perl use strict; use warnings; my $true = 1; my $false = 0; my ($header, $history, $footer, @fields); my ($vendor, $i); my $file = "AUG.txt"; my @FILE; my $vendor_id = 0; my @VENDORS; my @CONTRACTS; my $contract_id = 0; my @AWARDS; open (INFILE, $file); @FILE = <INFILE>; close (INFILE); foreach (@FILE){ chomp; next if /^$/; if (/VENDOR.+PAGE/){ @fields = split; $vendor_id++; #push @VENDORS,"$vendor_id $fields[1]\n"; print "\n\nVENDOR \= $fields[1]\n"; next; } elsif (/\s+?\S{17}\s+?\S+?\./){ #push @CONTRACTS,"$vendor_id $_\n"; @fields = split; print " CONTRACT NUMBER \= $fields[0]\n"; print " VENDOR PRICE \= $fields[1]\n"; print " BASE PRICE \= $fields[2]\n"; print " QTY \= $fields[3]\n"; print " SHIP DATE \= $fields[4]\n"; print " PR NUMBER \= $fields[5]\n"; print " ARR NUMBER \= $fields[6]\n"; print " DOLLAR VALUE \= $fields[7]\n"; print " DOLLAR VARIENCE \= $fields[8]\n"; print " PERCENT VARIANCE \= $fields[9]\n"; print "\n"; next; } elsif (/^\s+?\S{13}\s+?\S+?\s+?\S/){ #print "$_\n"; $_ =~ s/^\s*//; my @fields = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $_; print " PIIN \= $fields[0]\n"; print " FSCM \= $fields[1]\n"; print " N/A \= $fields[2]\n"; print " U/I \= $fields[3]\n"; print " UNIT PRICE \= $fields[4]\n"; print " AWD DT \= $fields[5]\n"; print " QTY \= $fields[6]\n"; print " OPT DT \= $fields[7]\n"; print " FOB \= $fields[8]\n"; print " REP \= $fields[9]\n"; print " TYPE \= $fields[10]\n"; print "\n"; } else{ $_ =~ s/^\s*//; if (/^\d{2}\s\d{3}/){ print "$_\n"; } } }
    Part of the out put:
    VENDOR = 1NWV5 CONTRACT NUMBER = AAB40003VG880MODF VENDOR PRICE = 3.25000 BASE PRICE = 0.76000 QTY = 34 SHIP DATE = EA PR NUMBER = 00000000 ARR NUMBER = YPG03188000386 DOLLAR VALUE = 3110009197232 DOLLAR VARIENCE = 110.50 PERCENT VARIANCE = 84.66 PIIN = CFS50080P7291 FSCM = 5N366 N/A = N U/I = EA UNIT PRICE = 0.30000 AWD DT = 80004 QTY = 6,600 OPT DT = 00000 FOB = D REP = Y TYPE = B 01 001ROLLER,NEEDLE 02 002DIV GENERAL MOTORS CORP 03 003PAGE 73342 04 004P/N 2275468 05 005IDENTIFY TO: 06 006 07 007
    Ofcourse keeping in mind that both the contract information and the history information can repeat any number of times per vendor. Now I need to somehow create a data structure that will allow me to easily read the data back out and make database inserts. Here is how the data is related:
    VENDOR = 1NWV5 FOREACH VENDOR LIST OF CONTRACTS FOREACH CONTRACT LIST OF CONTRACT INFORMATION LIST OF AWARDS FOREACH AWARD LIST OF AWARD INFORMATION CONTRACT DESCRIPTION [The three or four lines after the h +istory - This is getting dumped in a big text field in the database.]
    From the looks of it I would have an Array of Vendors containing an Array of Contracts containg two Hashes (Contract Information and Contract Description) and an Array of Hashes. What I just said doesn't even make since to me. So hopefully you can put it in perspective or suggest an easier way. As i need to be able to pull the data back out of the structure. Thanks again -Shawn
      Something like this:
      push @{$hash{$vendor}{$history_row}}, @fields;
      from my original suggestion. That's all you have to do! That one is a hash of a hash of arrays. The for loop at the end demonstrated how you would unroll it.

      Have a look at perldsc and perllol. You will see how to build complex data structures like the one above and adapt it. Once you get the hang of it is very easy to use.

      Extracting and reporting is perl's bread and butter! Have a look at the docs, see how I built my structure and have a go at adapting it.

      If you get stuck come back.

      btw I would get advice on that unpack! Ask another question. If it gets out I'm giving advice like that I'll be excommuncated!

        Thanks again, I will ask.