in reply to Re^2: CSV_XS and UTF8 strings
in thread CSV_XS and UTF8 strings
So what you want is a new option to disable the need for quotation on characters with code-points > 127?
Note that the quote_space isn't even tested when writing the fields with the utf-8 characters. It is just tested when a space is encountered inside a field. While scanning a field, there is a flag that is set when quotation is required. When the flag has been set already by whatever other trigger, further tests are skipped. In your example that flag was already triggered by the first "binary" character, so the quote_space is effectively a no-op in your code.
I'm however not sure that I want to implement such a new feature as it will potentially create invalid CSV. OTOH it will be an option that is only used on writing CSV, which is relatively easy to change.
The current quote trigger is like:
if (c < csv->first_safe_char || (c >= 0x7f && c <= 0xa0) || (csv->quote_char && c == csv->quote_char) || (csv->sep_char && c == csv->sep_char) || (csv->escape_char && c == csv->escape_char)) { /* Binary character */ break; }
A new flag could make that into something like
if (c < csv->first_safe_char || (csv->quote_binary && c >= 0x7f && + c <= 0xa0) || (csv->quote_char && c == csv->quote_char) || (csv->sep_char && c == csv->sep_char) || (csv->escape_char && c == csv->escape_char)) { /* Binary character */ break; }
Leaving it safe for all ASCII binary. I could do that.
update done
Text-CSV_XS $ cat test.pl use strict; use warnings; binmode STDOUT, ":utf8"; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\n +" }); $csv->quote_binary (1); # default $csv->print (*STDOUT, [ undef, "", " ", 1, "a b ", "\x{20ac}" ]); $csv->quote_binary (0); $csv->print (*STDOUT, [ undef, "", " ", 1, "a b ", "\x{20ac}" ]); Text-CSV_XS $ perl -Iblib/{lib,arch} test.pl ,," ",1,"a b ","€" ,," ",1,"a b ",€ Text-CSV_XS $
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: CSV_XS and UTF8 strings
by beerman (Novice) on Oct 19, 2011 at 16:46 UTC | |
by Tux (Canon) on Oct 19, 2011 at 17:35 UTC | |
by beerman (Novice) on Oct 19, 2011 at 21:41 UTC | |
by Tux (Canon) on Oct 20, 2011 at 06:58 UTC |