I have some interestingly formatted CSV I need to get into a Perl hash. One of the columns uses "s as the quote char, all other columns use 's. The " column has 's in it. See the "JOHNS FLYING DOG" row. The last time this data format was used, I wrote text editor regexes to turn the CSV data into a literal perl hash of array refs, then processes that data structure into a hash of hashes. I'd like a better solution than the previous one (I also lost the previous solution). How do I get this CSV into a Perl hash? I am not sure Text::CSV has the options to do it, and I couldn't easily find any other CSV parser engines on CPAN (everything seems to be wrappers around Text::CSV or CSV_XS).
use Text::CSV::Hashify;
use Data::Dumper;
my $filename = 'bad.csv';
$obj = Text::CSV::Hashify->new( {
file => $filename,
format => 'hoh',
key => "PRODUCT CODE",
quote_char => "'",
escape_char => "|", #wont be found in the data, turn off
+escaping
auto_diag => 1,
} );
print Dumper($obj->all);
The CSV file
'PRODUCT CODE','CATEGORY','CATEGORY DESCRIPTION','CODE DESCRIPTION','O
+PTIONAL CATEGORY','OPTIONAL CATEGORY DESCRIPTION'
' ','0 ','No Item',"INVALID CODE IN USER SUPPLIED DATA",' ',' '
'00100','1 ','Cat',"ORANGE CAT",' ',' '
'82131','94 ','Dog',"GREEN DOG",' ',' '
'82132','94 ','Dog',"'JOHNS' FLYING' DOG (Start 2001)",' ',' '
'82133','94 ','Dog',"MAGENTA DOG (End 2009)",' ',' '
The error message
# CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 4 pos 24
$VAR1 = {
' ' => {
'CODE DESCRIPTION' => '"INVALID CODE IN USER SU
+PPLIED DATA"',
'OPTIONAL CATEGORY DESCRIPTION' => ' ',
'CATEGORY' => '0 ',
'PRODUCT CODE' => ' ',
'CATEGORY DESCRIPTION' => 'No Item',
'OPTIONAL CATEGORY' => ' '
},
'82131' => {
'CODE DESCRIPTION' => '"GREEN DOG"',
'CATEGORY' => '94 ',
'OPTIONAL CATEGORY DESCRIPTION' => ' ',
'PRODUCT CODE' => '82131',
'OPTIONAL CATEGORY' => ' ',
'CATEGORY DESCRIPTION' => 'Dog'
},
'00100' => {
'OPTIONAL CATEGORY' => ' ',
'CATEGORY DESCRIPTION' => 'Cat',
'CODE DESCRIPTION' => '"ORANGE CAT"',
'CATEGORY' => '1 ',
'OPTIONAL CATEGORY DESCRIPTION' => ' ',
'PRODUCT CODE' => '00100'
}
};
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.