comment on

Thank you all for the help... Hopefully my last question. Here is the code that at the very least seperates the data out.

#!/usr/bin/perl

use strict;
use warnings;

my $true = 1;
my $false = 0;
my ($header, $history, $footer, @fields);
my ($vendor, $i);
my $file = "AUG.txt";
my @FILE;
my $vendor_id = 0;
my @VENDORS;
my @CONTRACTS;
my $contract_id = 0;
my @AWARDS;

open (INFILE, $file);
@FILE = <INFILE>;
close (INFILE);


foreach (@FILE){
   chomp;
   next if /^$/;
   if (/VENDOR.+PAGE/){
      @fields = split;
      $vendor_id++;
      #push @VENDORS,"$vendor_id $fields[1]\n";
      print "\n\nVENDOR \= $fields[1]\n";
      next;
   }
   elsif (/\s+?\S{17}\s+?\S+?\./){
      #push @CONTRACTS,"$vendor_id $_\n";
      @fields = split;
      print "   CONTRACT NUMBER  \= $fields[0]\n";
      print "   VENDOR PRICE     \= $fields[1]\n";
      print "   BASE PRICE       \= $fields[2]\n";
      print "   QTY              \= $fields[3]\n";
      print "   SHIP DATE        \= $fields[4]\n";
      print "   PR NUMBER        \= $fields[5]\n";
      print "   ARR NUMBER       \= $fields[6]\n";
      print "   DOLLAR VALUE     \= $fields[7]\n";
      print "   DOLLAR VARIENCE  \= $fields[8]\n";
      print "   PERCENT VARIANCE \= $fields[9]\n";
      print "\n";
      next;
   }
   elsif (/^\s+?\S{13}\s+?\S+?\s+?\S/){
      #print "$_\n";
      $_ =~ s/^\s*//;
      my @fields = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $_;

        print "      PIIN       \= $fields[0]\n";
        print "      FSCM       \= $fields[1]\n";
        print "      N/A        \= $fields[2]\n";
        print "      U/I        \= $fields[3]\n";
        print "      UNIT PRICE \= $fields[4]\n";
        print "      AWD DT     \= $fields[5]\n";
        print "      QTY        \= $fields[6]\n";
        print "      OPT DT     \= $fields[7]\n";
        print "      FOB        \= $fields[8]\n";
        print "      REP        \= $fields[9]\n";
        print "      TYPE       \= $fields[10]\n";
        print "\n";
   }
   else{
     $_ =~ s/^\s*//;
     if (/^\d{2}\s\d{3}/){
        print "$_\n";
     }
   }
}
[download]

Part of the out put:

VENDOR = 1NWV5
   CONTRACT NUMBER  = AAB40003VG880MODF
   VENDOR PRICE     = 3.25000
   BASE PRICE       = 0.76000
   QTY              = 34
   SHIP DATE        = EA
   PR NUMBER        = 00000000
   ARR NUMBER       = YPG03188000386
   DOLLAR VALUE     = 3110009197232
   DOLLAR VARIENCE  = 110.50
   PERCENT VARIANCE = 84.66

      PIIN       = CFS50080P7291
      FSCM       = 5N366
      N/A        = N
      U/I        = EA
      UNIT PRICE =       0.30000
      AWD DT     =    80004
      QTY        =     6,600
      OPT DT     =     00000
      FOB        =    D
      REP        =     Y
      TYPE       =      B

01 001ROLLER,NEEDLE     02 002DIV GENERAL MOTORS CORP
03 003PAGE 73342        04 004P/N 2275468
05 005IDENTIFY TO:      06 006
07 007
[download]

Ofcourse keeping in mind that both the contract information and the history information can repeat any number of times per vendor. Now I need to somehow create a data structure that will allow me to easily read the data back out and make database inserts. Here is how the data is related:

VENDOR = 1NWV5
   FOREACH VENDOR
       LIST OF CONTRACTS
          FOREACH CONTRACT
             LIST OF CONTRACT INFORMATION
             LIST OF AWARDS
                FOREACH AWARD
                   LIST OF AWARD INFORMATION
             CONTRACT DESCRIPTION [The three or four lines after the h
+istory - This is getting dumped in a big text field in the database.]
[download]

From the looks of it I would have an Array of Vendors containing an Array of Contracts containg two Hashes (Contract Information and Contract Description) and an Array of Hashes. What I just said doesn't even make since to me. So hopefully you can put it in perspective or suggest an easier way. As i need to be able to pull the data back out of the structure. Thanks again -Shawn

In reply to Re^4: Parsing large text file with perl by maida
in thread Parsing large text file with perl by maida

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.