comment on

I'm not really sure if this is expected behavior or not, but the T::CSV::Simple module chokes on the following line

20050118184913996273,33301,EQ,A,"Hemkopskedjan Ab "B" Ordinary Shares 
+(Sweden)",1.0,US,NAS,USD,N,700,OBB,,NAS-OBB,HKPKF.OB, ,0,0.0,,,,,0.0,
+,,00000000,00000000,00000000, , , ,,0.0,1e-06,EQ,N
[download]

Specifically, the embedded "B" gives it problems. Now, I don't trust the person who created this file to be following appropriate CSV format at all (his idea for solving the issue of commas embedded in data, for example, was to put columns like that on the end, he's apparently never heard of using double-quotes). And though I'm not intimately familiar with the CSV spec either, it doesn't seem like having a double-quoted internal value should be verboten. Should it?

Either way, I need to deal with this. I suspect I don't want to play around with the module (after all, I might want to parse other CSV files) unless I want to copy it and play with the copy, though that seems unwise architecturally. Any advice?

I suspect looking at code isn't necessary for anyone who can answer this, but just in case (and for completeness):

$parser->field_map(qw/TIMESTAMP INSTRUMENT_ID INSTRUMENT_TYPE STATUS D
+ESCRIPTION
CONSIDERATION_FACTOR COUNTRY_CODE EXCHANGE_ID CURRENCY_ID TRADED_IN_MI
+NOR_CCY
PRIMARY_BOOK SEGMENT_CODE SECTOR_ID PRIMARY_MARKET ISIN_CODE SETTLEMEN
+T_TYPE
SETTLEMENT_DAYS MIN_SIZE_OUTSIDE_SPREAD SETTLEMENT_EXCHANGE_ID INSTRUM
+ENT_TYPE_QUALIFIER
DELIVERY_MECHANISM COUNTRY_OF_INCORPORATION COUPON_RATE COUPON_DATE_1 
+COUPON_DATE_2
ISSUE_DATE EXPIRY_DATE ACCRUED_START_DATE SHORT_FEBRUARY COUPON_TYPE A
+CCRUED_CALC_TYPE
DAY_COUNT_METHOD DENOMINATION TICK_SIZE PRODUCT_TYPE IS_RESEARCHED/);
while ( my @csvdata = $parser->read_file($instfile)){
      $desc{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"DESCRIPTION"};
      $status{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"STATUS"};
      $timestamp{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"TIMESTAMP"};
      $country_code{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"COUNTRY_COD
+E"};
      $iisin{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"ISIN_CODE"};
      $icode{$csvdata{"INSTRUMENT_ID"}} = $csvdata{"INSTRUMENT_TYPE"};

      $ids{$csvdata{"INSTRUMENT_ID"}}++;
}
[download]

Yes, I know I could have set up my data structure better. This was basically a first pass.
Thanks,
SamCG

In reply to Text::CSV::Simple parsing issue by SamCG

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.