The problem seems simple enough, but isn't specified completely enough for a complete solution that doesn't involve a bit of lucky guessing. Is there another record that comes after END, for example? Are the number of fields the same for each row? Are there always two rows per record?

At minimum, it does appear that you're dealing with fixed-width fields, and that you want to skip the first four lines. It's not clear to me what you want to have happen after "END" (continue on to a new record, or stop? And will that next record have its own headers? Will it have the same format as the first record?

For fixed-width fields, you might want to use unpack, as my @fields = unpack '(a7x)2a7', $_;, for example. This will have to come after whatever logic you use to disqualify some lines. That logic might look like this:

while( <DATA> ) { next if $. < 5; chomp; next if ! length; last if /^END/; my @fields = unpack '(a7x)2a7', $_; # Do something with the fields. }

This would change a bit if there are more than one record you're interested in. You might incorporate the flip-flop operator like this:

my $record_start = 0; my @recs; while( <DATA> ) { chomp; if( /^TABLE NAME/ .. /^END/ ) { # We're in a new record... if( /^TABLE NAME/ ) { $record_start = $.; push @recs, []; } next unless $. > $record_start + 3; # Skip header. next if ! length; next if /^END/; my @fields = unpack '(a7x)2a7', $_; # Do something with fields, such as... push @{$recs[-1]}, [@fields]; } }

(Updated to demonstrate pushing records onto a "@recs" array.)


Dave


In reply to Re: text processing by davido
in thread text processing by DAVERN

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.