in reply to Re^2: Text Extraction
in thread Text Extraction

I've stored your data in a file called 782426.txt, because I couldn't get DATA to work with embedded Ctrl-Z characters.

Now, in the below solution, I read the data a line at a time, building up a record until I encounter the line which I know, a priori, to be the last one in each record. I could have tried to exploit the blank line which occurs between records, but in dealing with the header, I rather crudely blow away all blank lines.

use strict; use warnings; open F, '<', '782426.txt' or die; $/="\r\n"; binmode F; my $last_key; my $record = {}; my @records; while (<F>) { s/^//; # kill that pesky thing. /^\s+NEW CAR INV prepared by / and scalar(<F>), next; # the header /^[\.\s]*$/ and next; # skip any blank lines /^\d+ records listed\./ and last; # end of report chomp; my($key,$val) = /^(.{19}) +(.*)/; if ($key=~/\S/) { $key =~ s/\.+$//; # kill trailing dots $record->{ $last_key=$key } = $val; } elsif(defined $last_key) { $record->{$last_key} .= $val; } if ( $key eq 'SALES CST' ) # last line of record { push @records, $record; $record = {}; } } close F; for my $record ( @records ) { print "$_='$record->{$_}'\n" for sort keys %$record; print "\n"; }

I didn't bother trying to deal with that line at the top of your data which contains a lone '1' character.

Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.