in reply to Text Extraction
Presumably, that ^Z is actually a Control-Z character, i.e. ASCII character 26, which was used in DOS (and in CP/M) as the end-of-file marker. It is no longer necessary but is ignored by Windows and Perl (unless you have turned on binmode for the stream).
The following is a succinct solution which is specific to your input data format. If the format changes, you'd probably have to tweak this somewhat.
I show how to extract just the values in each record, as you asked; but I also show how to extract the values along with the field names, in case that is useful. (I would think it would be, ordinarily.)
use strict; use warnings; $/ = ''; # paragraph mode while (<DATA>) { next if /^\s/; # the header chomp; my @just_the_values = /^[^.]*\.* {1,2}(.*)/mg; my %keys_and_values = /^([^.]*)\.* {1,2}(.*)/mg; print "'$_'\n" for @just_the_values; print "$_ = '$keys_and_values{$_}'\n" for sort keys %keys_and_valu +es; print "----------------\n"; } __DATA__ NEW VEHICLE INVENTORY prepared by ANYUSER 04:15:00pm 21 Jul 2009 - PAGE # 2 STOCK NO........... G0034203 YR................. 10 CARLINE............ ACADIA SERIAL#............ 1GKLRKEDXAJ102450 COLOR DESCRIPTIONS. / LST PRICE.......... 36010.00 SALES CST.......... 36010.00 DAY................ 20 SC................. 1 STOCK NO........... G0034204 YR................. 10 CARLINE............ ACADIA SERIAL#............ 1GKLRKED1AJ101543 COLOR DESCRIPTIONS. / LST PRICE.......... 33615.00 SALES CST.......... 33615.00 DAY................ 20 SC................. 1
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Text Extraction
by sonicscott9041 (Novice) on Jul 22, 2009 at 20:22 UTC | |
by jdporter (Paladin) on Jul 22, 2009 at 21:15 UTC |