A loop with lots of flags.
This builds a hash. It loads _all_ the data. You would want to filter out fixed info you don't want. Also, depending on the data you would expect, you would also want to make sure the regexs are tight enough (these are very loose). As you can see, it is in 'verbose' idiom.
I tried it with 2 records and with more than 1 history field.
#!/bin/perl5 use strict; use warnings; my %hash; my $true = 1; my $false = 0; my ($header, $history, $footer, @fields ); my ($vendor, $i ); while (<DATA>){ chomp; next if /^$/; if (/^VENDOR/ and /PAGE/){ $header = $true; $footer = $false; @fields = split; $vendor = $fields[1]; push @{$hash{$vendor}{'header'}} , @fields; $i = 0; next; } elsif ( /AWARD\sHISTORY/ ){ $header = $false; $history = $true; next; } elsif ( /PID/ ){ $history = $false; $footer = $true; next; } if ( $history ){ @fields = split; my $history_row = join '', 'history ', $i; push @{$hash{$vendor}{$history_row}}, @fields; $i++; } elsif ( $footer ){ # bug fix, was $header @fields = split; push @{$hash{$vendor}{'footer'}}, @fields; } } open my $out, '>', 'parse.txt'; for my $v ( keys %hash ){ print $out "vendor: $v\n"; for my $rec ( keys %{$hash{$v}} ){ print $out "\trecord:\t$rec\n"; print $out "\t\t"; for my $fld ( @{$hash{$v}{$rec}} ){ print $out "$fld\t"; } print $out "\n"; } } close $out; __DATA__ VENDOR 61125 TOTAL DOLLAR VAR 77,097.60 PAGE 1 2003 08 01 VENDOR SIS UNIT BASE SHIP TOT DOL DOLLAR PERCENT CONTRACT NUMBER PRICE PRICE QTY U/I DATE + PR NUMBER BIN/PART NUMBER VALUE VARIANCE VARIANCE YT67DY7898DUFT5126 88.20000 70.00000 50 EA 0000000 +0 POI90809819856 1560007117067 4,410.00 910.00 0 AWARD HISTORY PIIN BSCM N/A U/I UNIT PRI +CE AWD DT QTY OPT DT FOB REP TYPE 765WTY34TF56A 7J777 N EA 39.5 +5000 93012 147 00000 2 Y B PID DATA LINE NR + LINE NR 01 001PART, DESCRIPTION, DATA + 02 002TECHNICAL DATA AVAILABILITY: 03 003 VENDOR 61126 TOTAL DOLLAR VAR 77,097.60 PAGE 1 2003 08 01 VENDOR SIS UNIT BASE SHIP TOT DOL DOLLAR PERCENT CONTRACT NUMBER PRICE PRICE QTY U/I DATE + PR NUMBER BIN/PART NUMBER VALUE VARIANCE VARIANCE YT67DY7898DUFT5126 88.20000 70.00000 50 EA 0000000 +0 POI90809819856 1560007117067 4,410.00 910.00 0 AWARD HISTORY PIIN BSCM N/A U/I UNIT PRI +CE AWD DT QTY OPT DT FOB REP TYPE 765WTY34TF56A 7J777 N EA 39.5 +5000 93012 147 00000 2 Y B 765WTY34TF56B 7J777 N EA 39.5 +5000 93012 147 00000 2 Y B 765WTY34TF56C 7J777 N EA 39.5 +5000 93012 147 00000 2 Y B PID DATA LINE NR + LINE NR 01 001PART, DESCRIPTION, DATA + 02 002TECHNICAL DATA AVAILABILITY: 03 003
produces..
vendor: 61125 record: history 0 765WTY34TF56A 7J777 N EA 39.55000 93012 147... record: footer 01 001PART, DESCRIPTION, DATA 02 002TECHNICAL... record: header VENDOR 61125 TOTAL DOLLAR VAR 77,097.60 PAGE... vendor: 61126 record: history 0 765WTY34TF56A 7J777 N EA 39.55000 93012 147... record: history 2 765WTY34TF56C 7J777 N EA 39.55000 93012 147... record: history 1 765WTY34TF56B 7J777 N EA 39.55000 93012 147... record: footer 01 001PART, DESCRIPTION, DATA 02 002TECHNICAL... record: header VENDOR 61126 TOTAL DOLLAR VAR 77,097.60 PAGE...
Update: added output
Update2: Fixed bug! Footer wasn't stored.
Update3: Truncated and formated the output (tabs were a bad idea)

In reply to Re: Parsing large text file with perl by wfsp
in thread Parsing large text file with perl by maida

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.