in reply to Text Extraction

Presumably, that ^Z is actually a Control-Z character, i.e. ASCII character 26, which was used in DOS (and in CP/M) as the end-of-file marker. It is no longer necessary but is ignored by Windows and Perl (unless you have turned on binmode for the stream).

The following is a succinct solution which is specific to your input data format. If the format changes, you'd probably have to tweak this somewhat.

I show how to extract just the values in each record, as you asked; but I also show how to extract the values along with the field names, in case that is useful. (I would think it would be, ordinarily.)

use strict; use warnings; $/ = ''; # paragraph mode while (<DATA>) { next if /^\s/; # the header chomp; my @just_the_values = /^[^.]*\.* {1,2}(.*)/mg; my %keys_and_values = /^([^.]*)\.* {1,2}(.*)/mg; print "'$_'\n" for @just_the_values; print "$_ = '$keys_and_values{$_}'\n" for sort keys %keys_and_valu +es; print "----------------\n"; } __DATA__ NEW VEHICLE INVENTORY prepared by ANYUSER 04:15:00pm 21 Jul 2009 - PAGE # 2 STOCK NO........... G0034203 YR................. 10 CARLINE............ ACADIA SERIAL#............ 1GKLRKEDXAJ102450 COLOR DESCRIPTIONS. / LST PRICE.......... 36010.00 SALES CST.......... 36010.00 DAY................ 20 SC................. 1 STOCK NO........... G0034204 YR................. 10 CARLINE............ ACADIA SERIAL#............ 1GKLRKED1AJ101543 COLOR DESCRIPTIONS. / LST PRICE.......... 33615.00 SALES CST.......... 33615.00 DAY................ 20 SC................. 1
Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.

Replies are listed 'Best First'.
Re^2: Text Extraction
by sonicscott9041 (Novice) on Jul 22, 2009 at 20:22 UTC
    Here is the entire data file: / snip / See updated data below.

      I've stored your data in a file called 782426.txt, because I couldn't get DATA to work with embedded Ctrl-Z characters.

      Now, in the below solution, I read the data a line at a time, building up a record until I encounter the line which I know, a priori, to be the last one in each record. I could have tried to exploit the blank line which occurs between records, but in dealing with the header, I rather crudely blow away all blank lines.

      use strict; use warnings; open F, '<', '782426.txt' or die; $/="\r\n"; binmode F; my $last_key; my $record = {}; my @records; while (<F>) { s/^//; # kill that pesky thing. /^\s+NEW CAR INV prepared by / and scalar(<F>), next; # the header /^[\.\s]*$/ and next; # skip any blank lines /^\d+ records listed\./ and last; # end of report chomp; my($key,$val) = /^(.{19}) +(.*)/; if ($key=~/\S/) { $key =~ s/\.+$//; # kill trailing dots $record->{ $last_key=$key } = $val; } elsif(defined $last_key) { $record->{$last_key} .= $val; } if ( $key eq 'SALES CST' ) # last line of record { push @records, $record; $record = {}; } } close F; for my $record ( @records ) { print "$_='$record->{$_}'\n" for sort keys %$record; print "\n"; }

      I didn't bother trying to deal with that line at the top of your data which contains a lone '1' character.

      Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.