In the if block you are setting $csv to $3 but $3 is undefined. Then you append $2 to $csv. This has the result that the first field is only present once but it has a comma before it, which shouldn't be there.

In your original data you had one record set that didn't have the 'SALES CST' record. As a result, that record set and the following one were concatenated together. Such problems are not uncommon when parsing irregular text files. It is good to include error checking, but that depends on having a good sense of what is or isn't allowed. For example: is it an error for a set of records to be missing 'SALES CST'? Or is this OK? If it is OK for the 'SALES CST' record to be missing from a set, what should be written to the CSV file for such a record set?

Here is a variation that requires the first record to be 'STOCK NO' and reports errors if there is an unknown field or a duplicate field. Any field but the first can be missing from a record set and will default to an empty string.

use strict; use warnings; use Text::CSV; # Field names in the order they are to appear in the CSV file my @fields = ( 'STOCK NO', 'YEAR', 'MAKE', 'CARLINE', 'COLOR DESCRIPTIONS', 'SALES CST', ); my $file = '782426.pl'; open(my $fh, '<', $file) or die "$file: $!"; my $csv = Text::CSV->new( { eol => "\n" } ); $csv->print(\*STDOUT, \@fields ); my @columns; my %columns = map { ( $_ => "" ) } @fields; while ( <$fh> ) { chomp; next unless( m/^([^\.]+)\.+\s+(.*)/ ); if($1 eq $fields[0]) { write_columns(\%columns); reset_columns(\%columns); } die "Unknown column $1" unless(exists($columns{$1})); die "duplicate column $1" if($columns{$1}); $columns{$1} = $2; } write_columns(\%columns); close($fh); exit(0); sub write_columns { my $columns = shift; if($columns->{$fields[0]}) { $csv->print(\*STDOUT, [ map { $columns->{$_} } @fields + ] ); } } sub reset_columns { my $columns = shift; %$columns = map { ( $_ => "" ) } @fields; }

This has added some complexity but it is less likely to produce a CSV file with hard to detect errors. You can add more checks to reduce the risk or errors further.


In reply to Re^7: Text Extraction by ig
in thread Text Extraction by sonicscott9041

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.