Dear monks,
I am trying to write unicode, specifically utf-8, to a file; as the data exists in iso-8859-1 (or another character set) it must be converted first.
I then write an utf-8 string to a file, and after reading the docs I thought I must open the file for writing using
open($fh, '>:utf8', $filename), however when I do this and look at the file in any unicode-capable editor I see garbage. If I write the file normally, using
open($fh, '>', $filename) all seems well. As this contradicts perluniintro, which clearly states one should use the former open() method, or even use
use open ':utf8' when dealing with files, I am sure I must be doing something wrong.
The following code is meant to illustrate my problem. The files '_original' and '_decoded' are the same and I do not find this surprising. The file '_utf8' does not display the characters correctly, unless I change the code to
write_to_file('>', '_utf8', $utf8);.
use Encode;
sub write_to_file {
my ($mode, $filename, $what) = @_;
open (my $fh, $mode, $filename)
or die "Couldn't open $filename for writing: $@";
print $fh $what;
close $fh;
}
my $iso_8859_1 = 'Österreich';
my $string = Encode::decode('iso-8859-1', 'Österreich');
my $utf8 = Encode::encode_utf8($string);
write_to_file('>', '_original', $iso_8859_1);
write_to_file('>', '_decoded', $string);
write_to_file('>:utf8', '_utf8', $utf8);
I would appreciate any wisdom you could shed on the matter.
-- tel
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.