btubby has asked for the wisdom of the Perl Monks concerning the following question:

I'm using XML::RSS to extract certain fields from RSS feeds, but am experiencing problems extracting all the data I can see in the source (via the rss object).

For example, the following URL;
http://blogs.news.com.au/moneystuff/index.php/xml/rss_popular/14

contains a single rss item. I need to get a the comment data. item source below;

<item> <title><![CDATA[Beware of financial leeches]]></title> <link>http://blogs.news.com.au/moneystuff/index.php/news/com +ments/beware_of_financial_leeches1/</link> <guid>http://blogs.news.com.au/moneystuff/index.php/news/com +ments/beware_of_financial_leeches1/#60515</guid> <pubDate>Sun, 13 Sep 2009 20:40:00 GMT</pubDate> <description><![CDATA[INSTEAD of sucking your blood, financi +al leeches suck your cash and can suck the life out of your relations +hip with them.]]></description> <dc:source>blogs.news.com.au/moneystuff/index.php</dc:source +> <dc:contributor>Anthony Keane</dc:contributor> <category>Money</category> <category>News</category> <slash:comments>2</slash:comments> <ndm:comments publishedtotal="3" itemtotal="5"> <ndm:comment> <ndm:name>The Other Martin</ndm:name> <ndm:email>n/a</ndm:email> <ndm:ip>n/a</ndm:ip> <ndm:url>http://blogs.news.com.au/moneystuff/index.php/news/commen +ts/beware_of_financial_leeches1</ndm:url> <ndm:date>Mon, 14 Sep 2009 00:15:26 GMT</ndm:date> <ndm:body><![CDATA[You haven&#8217;t mentioned the Greatest of all + Australian Leaches &#45; the ATO! Folowed closely ehind by the large +r lesser leaches (State Governments) and smaller lesser&#8230;]]></nd +m:body> </ndm:comment> <ndm:comment> <ndm:name>JP</ndm:name> <ndm:email>n/a</ndm:email> <ndm:ip>n/a</ndm:ip> <ndm:url>http://blogs.news.com.au/moneystuff/index.php/news/commen +ts/beware_of_financial_leeches1</ndm:url> <ndm:date>Sun, 13 Sep 2009 23:16:21 GMT</ndm:date> <ndm:body><![CDATA[Tell them: We have to be careful because our re +sources are not the best a little like you. If they do not get&#8230 +;]]></ndm:body> </ndm:comment> </ndm:comments> </item>

but the following code does not work as expected; (assuming $source is a scalar containing the RSS source)
my $rss = XML::RSS->new(); $rss->parse($source); foreach my $item ( @{ $rss->{items} } ) { print Dumper($item); }

The Dumped output is messed up, eg;
'http://feeds.news.com.au/dtd/blogcomments/' => { + 'date' => 'Mon, 14 Sep 2009 00:15:26 GMTSun, 13 Sep 2009 23:16:21 GMT' +, 'ip' => 'n/an/a', 'name' => 'The Other MartinJP', }
i.e the 2 comment blocks have been combined into one. Why is this happening? Any help appreciated. Thanks

20090928 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: XML::RSS parse problem
by btubby (Novice) on Sep 26, 2009 at 13:44 UTC
    Anyone?? This is really bugging me.. pretty please!
      Since I have never used XML::RSS, I can only offer generic advice.
      • A Super Search yields 19 threads related toXML::RSS. Perhaps one of them may give you a clue: ?node_id=3989;HIT=xml%20rss;re=N
      • Look at the module's source code (for the version you have installed).
      • Post a question on the Discussion forum
      • I wouldn't know an RSS if I was kicking one, but have you tried alternate CPAN modules? I wouldn't be suprised if XML::Twig could handle this format.
      • Contact the module author by email.