Re^2: A Character Set Enquiry

Perl doesn't have a preferred character set.

Not quite true. If you read binary data, and try to treat it as text data (like using uc or lc) it's treated as Latin-1.

In fact, it is possible that your data is stored correctly in the database, but it is only when you print it out that it doesn't look right.

Very unlikely if he dumped UTF-8 data into a Latin-1 database and then converted it to UTF-8

Comment on Re^2: A Character Set Enquiry

Replies are listed 'Best First'.
Re^3: A Character Set Enquiry by ysth (Canon) on Jul 11, 2008 at 03:38 UTC
By default, arbitrary data with the utf8 flag on will be treated as unicode characters (equivalent to latin-1 through codepoint 255). But by default without the flag on, it is treated as specified by the C locale, which is pretty much just ASCII. Try it: (remove the -CO if you have a non-utf8 terminal) `$ perl -CO -wle'print lc "\xc9"; print lc substr "\x{100}\xc9", 1'` [download] This outputs É then é. -- Online Fortune Cookie Search	[reply] [d/l]

Replies are listed 'Best First'.

Re^3: A Character Set Enquiry
by ysth (Canon) on Jul 11, 2008 at 03:38 UTC

$ perl -CO -wle'print lc "\xc9"; print lc substr "\x{100}\xc9", 1'
[download]