in reply to Re^3: Parsing RSS Feeds
in thread Parsing RSS Feeds

Basically I need to parse different types of feeds. So i thought of writing separate module. Feeds could be Pure RSS feed, Pure RSS Media feed, Youtube Feed OR Media cum rss feed OR Metacafe feed OR Yahoo Feed OR BLIP TV etc.
This is the reason i thought of wrinting a general module. Do we have any modules that can be used for the same ? if yes please suggest. I thought the approach would be like this, my Question - How can i differentiate between feeds?
if($val !~ /(.*)<rss version=\"(1|2\.0)\"(.*)/is ) { die qq{ Not a RSS Feed....Dying here..\n }; }
Just trying to figure out whether it is a rss feed or not?.
Some rss contains <?xml...bla bla..>and then (next line)<rss..
But this is not working. I just need to match optional <xml before <rss , it may be on the same line or next ..
Pls help.

Thanks,
Shekar

Replies are listed 'Best First'.
Re^5: Parsing RSS Feeds
by Corion (Patriarch) on Mar 25, 2010 at 11:45 UTC

    For RSS parsing, I would try an XML parser.

    For the rest, I'm no expert on how the different feeds are different. A Perl module for consuming various RSS feeds is Plagger, maybe you can look at that.

    Personally, I would, for each different format, create XPath queries that extract the interesting payload from the RSS documents. Almost all XML parsers will supply you with an XPath engine.