You should pay attention to the two sides of the coin:
- Parsing: Text::CSV_XS is created to do safe, reliable, and fast parsing of CSV data. The constructor supports many attributes to control the parsing of CSV data that is formatted outside of the default allowable small definition. The most common used attribute will be sep_char to allow for all the different non-standard seperation characters used by M$-Excel which uses the "list separation character" from the locale setting instead of the default comma when exporting to CSV. "The string is marked UTF8" only applies to this side of the coin: when reading CSV.
- Writing: many of the attributes only apply to parsing, some apply only to writing. The quote_space is one of them and has no influence whatsoever on parsing data.
Text::CSV_XS parses and writes bytes, not characters or letters. The "upgrade" to Unicode/UTF-8 only applies to the moment a field is correctly parsed and detected "binary" inside that field. When dealing with Unicode (in whatever encoding), you are absolutely sure that the text both in parsing and writing will contain "binary" bytes so you should always set that attribute. The fact that it is not default stems from the distant past. Setting that to a sane default of 1 could possible break backward compatibility.
In writing both whitespace and "binary" bytes will trigger quotation. Please don't mix quote_space (controlling quotation on whitespace) with quote_binary (controlling binary quotation - the new attribute), so what you perceive as "strange" is just a misconception of your understanding of the quote_space attribute.
Enjoy, Have FUN! H.Merijn
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.