It isn't CSV. So just write a parser for it. This isn't rocket surgery.

#!/usr/bin/perl -w use strict; my @data; while( <DATA> ) { my @row; while( /\G(?=.)/gc ) { my $val = undef; /\G\s*/gc; if( /\G'/gc ) { my $p = pos(); /\G(?:[^']+|'')*/gc; $val = substr( $_, $p, pos()-$p ); die "Unclosed '\n" if ! /\G'/gc; } elsif( /\G"/gc ) { my $p = pos(); /\G(?:[^"]+|"")*/gc; $val = substr( $_, $p, pos()-$p ); die "Unclosed \"\n" if ! /\G"/gc; } else { my $p = pos(); $val = substr( $_, $p, pos()-$p ) if /\G[^'",]+/gc; $val =~ s/\s*$//; } /\G\s*/gc; die "Bad data\n" if ! /\G(,|$)/gc; push @row, $val; } push @data, \@row; } __END__ 'PRODUCT CODE','CATEGORY','CATEGORY DESCRIPTION','CODE DESCRIPTION','O +PTIONAL CATEGORY','OPTIONAL CATEGORY DESCRIPTION' ' ','0 ','No Item',"INVALID CODE IN USER SUPPLIED DATA",' ',' ' '00100','1 ','Cat',"ORANGE CAT",' ',' ' '82131','94 ','Dog',"GREEN DOG",' ',' ' '82132','94 ','Dog',"'JOHNS' FLYING' DOG (Start 2001)",' ',' ' '82133','94 ','Dog',"MAGENTA DOG (End 2009)",' ',' '

Yep, not hard; worked the first try. Took a few minutes to write.

(Update: I neglected to post-process escaped quotation marks. Of course, no provision for escaped quotation marks was given in the original problem so I just implemented the simplest version, which might not be appropriate.)

- tye        


In reply to Re: parsing malformed CSV with per column quote chars (SMoP) by tye
in thread parsing malformed CSV with per column quote chars by bulk88

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.