in reply to How to write a utf-8 file

General remarks:

1. use three argument open with lexical file handle and exception handling i.e, instead of writing this,

open (IN, "<:encoding(UTF-8)", "D:/wordpress/wordpress.2011-04-12.xml" +);

write like,

open(my $fh, "<:encoding(UTF-8)", "filename") || die "can't open UTF-8 encoded filename: $!";

2. you are trying to parse the xml, dont use regular expresions, better to use XML::Simple or XML::Twig

Replies are listed 'Best First'.
Re^2: How to write a utf-8 file
by srikrishnan (Beadle) on Apr 13, 2011 at 05:02 UTC

    Thanks for your response

    Really I am not able to understand what how you are trying to help me

    As I am clearly mentioned in my mail, I have no problem in reading the xml

    problem is only with writing into a xml

    I want to confirm, which is the correct way, how can I write other than english text properly in the OUT xml?

    Thanks

    srikrishnan

      As ikegami posted above, your code is correct.
      Try it with a different input file and open the output file with a different text editor.
      Occasionally, Notepad++ fails to show unicode charters for me even when the file itself is OK. Close and reopen usually fixes it.

      Consider what happens if the file doesn't exist:

      In your original code, things will go wrong with no explanation.
      In the suggested alternative, the code will print "can't open UTF-8 encoded filename: File Not Found" and then exit safely.

      Depending on the specific problem, $! could be file not found, permission denied, out of disk space, locked by another process, etc... whatever reason the OS gives. Extremely helpful!

      You're already using the 3-arg version of open, which is good. You can add lexical file handles ($inFH rather than just IN), checking the return value (the "||", or better yet, "or"), and printing $! when things do go wrong. These are all good habits to get into, as they will help you avoid debugging pain in the future.