I have some interestingly formatted CSV I need to get into a Perl hash. One of the columns uses "s as the quote char, all other columns use 's. The " column has 's in it. See the "JOHNS FLYING DOG" row. The last time this data format was used, I wrote text editor regexes to turn the CSV data into a literal perl hash of array refs, then processes that data structure into a hash of hashes. I'd like a better solution than the previous one (I also lost the previous solution). How do I get this CSV into a Perl hash? I am not sure Text::CSV has the options to do it, and I couldn't easily find any other CSV parser engines on CPAN (everything seems to be wrappers around Text::CSV or CSV_XS).
use Text::CSV::Hashify; use Data::Dumper; my $filename = 'bad.csv'; $obj = Text::CSV::Hashify->new( { file => $filename, format => 'hoh', key => "PRODUCT CODE", quote_char => "'", escape_char => "|", #wont be found in the data, turn off +escaping auto_diag => 1, } ); print Dumper($obj->all);
The CSV file
'PRODUCT CODE','CATEGORY','CATEGORY DESCRIPTION','CODE DESCRIPTION','O +PTIONAL CATEGORY','OPTIONAL CATEGORY DESCRIPTION' ' ','0 ','No Item',"INVALID CODE IN USER SUPPLIED DATA",' ',' ' '00100','1 ','Cat',"ORANGE CAT",' ',' ' '82131','94 ','Dog',"GREEN DOG",' ',' ' '82132','94 ','Dog',"'JOHNS' FLYING' DOG (Start 2001)",' ',' ' '82133','94 ','Dog',"MAGENTA DOG (End 2009)",' ',' '
The error message
# CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 4 pos 24 $VAR1 = { ' ' => { 'CODE DESCRIPTION' => '"INVALID CODE IN USER SU +PPLIED DATA"', 'OPTIONAL CATEGORY DESCRIPTION' => ' ', 'CATEGORY' => '0 ', 'PRODUCT CODE' => ' ', 'CATEGORY DESCRIPTION' => 'No Item', 'OPTIONAL CATEGORY' => ' ' }, '82131' => { 'CODE DESCRIPTION' => '"GREEN DOG"', 'CATEGORY' => '94 ', 'OPTIONAL CATEGORY DESCRIPTION' => ' ', 'PRODUCT CODE' => '82131', 'OPTIONAL CATEGORY' => ' ', 'CATEGORY DESCRIPTION' => 'Dog' }, '00100' => { 'OPTIONAL CATEGORY' => ' ', 'CATEGORY DESCRIPTION' => 'Cat', 'CODE DESCRIPTION' => '"ORANGE CAT"', 'CATEGORY' => '1 ', 'OPTIONAL CATEGORY DESCRIPTION' => ' ', 'PRODUCT CODE' => '00100' } };

In reply to parsing malformed CSV with per column quote chars by bulk88

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.