srikrishnan has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I am trying to read a xml and get some of the required part from the xml and write it as a new xml. I am successfully collect the data by using the following script
use strict; use warnings; undef $/; open OUT, ">:encoding(UTF-8)", "D:/wordpress/wordpress_categories.xml" +; open (IN, "<:encoding(UTF-8)", "D:/wordpress/wordpress.2011-04-12.xml" +); my $line = <IN>; while ($line =~ /<title>(.*?)<\/title>\n\t\t<link>(.*?)<\/link>\n\ +t\t<pubDate>(.*?)<\/pubDate>\n\t\t<dc:creator>(.*?)<\/dc:creator>\n\t +\t\n\t\t<category>(.*?)<\/category>/i) { $line =~ s/(<title>(.*?)<\/title>\n\t\t<link>(.*?)<\/link>\n\t +\t<pubDate>(.*?)<\/pubDate>\n\t\t<dc\:creator>(.*?)<\/dc\:creator>\n\ +t\t\n\t\t<category>(.*?)<\/category>)//i; print OUT "$1\n\n"; } close (IN); close (OUT);
but the output xml is not produce the non english characters correctly. below is the wrong output
<title>எழுத ‹வேண்டிய கட்டு‹ரை +ள்</title> <link>http://naatkurippugal.wordpress.com/?p=501</link> <pubDate>Wed, 30 Nov -0001 00:00:00 +0000</pubDate> <dc:creator><![CDATA[ஸ்ரீஹரி]]></dc:creator> <category><![CDATA[கட்டு‹ரை]]></category>
can anybody help me to solve this problem?
Thanks in Advance,
srikrishnan
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to write a utf-8 file
by ikegami (Patriarch) on Apr 13, 2011 at 04:20 UTC | |
by moritz (Cardinal) on Apr 13, 2011 at 06:36 UTC | |
by ikegami (Patriarch) on Apr 13, 2011 at 07:33 UTC | |
|
Re: How to write a utf-8 file
by Nikhil Jain (Monk) on Apr 13, 2011 at 04:24 UTC | |
by srikrishnan (Beadle) on Apr 13, 2011 at 05:02 UTC | |
by elef (Friar) on Apr 13, 2011 at 09:43 UTC | |
by SuicideJunkie (Vicar) on Apr 13, 2011 at 15:12 UTC |