in reply to Re: A Character Set Enquiry
in thread A Character Set Enquiry

Perl doesn't have a preferred character set.

Not quite true. If you read binary data, and try to treat it as text data (like using uc or lc) it's treated as Latin-1.

In fact, it is possible that your data is stored correctly in the database, but it is only when you print it out that it doesn't look right.

Very unlikely if he dumped UTF-8 data into a Latin-1 database and then converted it to UTF-8

Replies are listed 'Best First'.
Re^3: A Character Set Enquiry
by ysth (Canon) on Jul 11, 2008 at 03:38 UTC
    By default, arbitrary data with the utf8 flag on will be treated as unicode characters (equivalent to latin-1 through codepoint 255). But by default without the flag on, it is treated as specified by the C locale, which is pretty much just ASCII. Try it: (remove the -CO if you have a non-utf8 terminal)
    $ perl -CO -wle'print lc "\xc9"; print lc substr "\x{100}\xc9", 1'
    This outputs É then é.