in reply to Re^6: Text Extraction
in thread Text Extraction

In the if block you are setting $csv to $3 but $3 is undefined. Then you append $2 to $csv. This has the result that the first field is only present once but it has a comma before it, which shouldn't be there.

In your original data you had one record set that didn't have the 'SALES CST' record. As a result, that record set and the following one were concatenated together. Such problems are not uncommon when parsing irregular text files. It is good to include error checking, but that depends on having a good sense of what is or isn't allowed. For example: is it an error for a set of records to be missing 'SALES CST'? Or is this OK? If it is OK for the 'SALES CST' record to be missing from a set, what should be written to the CSV file for such a record set?

Here is a variation that requires the first record to be 'STOCK NO' and reports errors if there is an unknown field or a duplicate field. Any field but the first can be missing from a record set and will default to an empty string.

use strict; use warnings; use Text::CSV; # Field names in the order they are to appear in the CSV file my @fields = ( 'STOCK NO', 'YEAR', 'MAKE', 'CARLINE', 'COLOR DESCRIPTIONS', 'SALES CST', ); my $file = '782426.pl'; open(my $fh, '<', $file) or die "$file: $!"; my $csv = Text::CSV->new( { eol => "\n" } ); $csv->print(\*STDOUT, \@fields ); my @columns; my %columns = map { ( $_ => "" ) } @fields; while ( <$fh> ) { chomp; next unless( m/^([^\.]+)\.+\s+(.*)/ ); if($1 eq $fields[0]) { write_columns(\%columns); reset_columns(\%columns); } die "Unknown column $1" unless(exists($columns{$1})); die "duplicate column $1" if($columns{$1}); $columns{$1} = $2; } write_columns(\%columns); close($fh); exit(0); sub write_columns { my $columns = shift; if($columns->{$fields[0]}) { $csv->print(\*STDOUT, [ map { $columns->{$_} } @fields + ] ); } } sub reset_columns { my $columns = shift; %$columns = map { ( $_ => "" ) } @fields; }

This has added some complexity but it is less likely to produce a CSV file with hard to detect errors. You can add more checks to reduce the risk or errors further.

Replies are listed 'Best First'.
Re^8: Text Extraction
by sonicscott9041 (Novice) on Jul 23, 2009 at 18:34 UTC
    Thanks ig for the latest script and the critque. I am now digesting and trying to learn from this latest script.