in reply to Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?

First of all, writing a string of characters to a file without first encoding makes assumptions about Perl's internal format and can earn you some warnings. That means

write_to_file('>', '_decoded', $string);

is wrong. There are two ways of encoding a string.

write_to_file('>', '_explicit_utf8', encode_utf8($string)); write_to_file('>:utf8', '_implicit_utf8', $string);

The problem you are having is that you are encoding it using encode_utf8 and then again using :utf8.

  • Comment on Re: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Writing unicode characters to file using open($fh, ">:utf8, $name) mangles unicode?
by telcontar (Beadle) on Aug 08, 2007 at 17:15 UTC
    Thank you, that makes perfect sense. It is supposed to be transparent and here I was doing it twice :-)

    But what if I download a web page, say LWP::UserAgent->get($url), and save it in a file in its native encoding, and this is not listed in Encode->encodings(':all') - am I stuck? :-)

    -- tel

      I'm not sure what you are asking.

      If you want to save the document in its original encoding: open(my $fh, '>', $filename); doesn't do any encoding. If you don't do any decoding, print $fh $raw; will save the content in its native encoding.

      If you want to save the document in UTF-8:

      Yeah, you're screwed. If Encode "doesn't speak the language", you won't be able to decode the content, so you're left with a bunch of meaningless octets.