Hi all,

I am trying to read a xml and get some of the required part from the xml and write it as a new xml. I am successfully collect the data by using the following script

use strict; use warnings; undef $/; open OUT, ">:encoding(UTF-8)", "D:/wordpress/wordpress_categories.xml" +; open (IN, "<:encoding(UTF-8)", "D:/wordpress/wordpress.2011-04-12.xml" +); my $line = <IN>; while ($line =~ /<title>(.*?)<\/title>\n\t\t<link>(.*?)<\/link>\n\ +t\t<pubDate>(.*?)<\/pubDate>\n\t\t<dc:creator>(.*?)<\/dc:creator>\n\t +\t\n\t\t<category>(.*?)<\/category>/i) { $line =~ s/(<title>(.*?)<\/title>\n\t\t<link>(.*?)<\/link>\n\t +\t<pubDate>(.*?)<\/pubDate>\n\t\t<dc\:creator>(.*?)<\/dc\:creator>\n\ +t\t\n\t\t<category>(.*?)<\/category>)//i; print OUT "$1\n\n"; } close (IN); close (OUT);

but the output xml is not produce the non english characters correctly. below is the wrong output

<title>எழுத ‹வேண்டிய கட்டு‹ரை +ள்</title> <link>http://naatkurippugal.wordpress.com/?p=501</link> <pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate> <dc:creator><![CDATA[ஸ்ரீஹரி]]></dc:creator> <category><![CDATA[கட்டு‹ரை]]></category>

can anybody help me to solve this problem?

Thanks in Advance,

srikrishnan


In reply to How to write a utf-8 file by srikrishnan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.