wintermute_115 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse RSS feeds with XML::RSS::Parser, and it's mostly working fine, except that I need to be able to pull the content of elements in the itunes namespace and nothing I can think of seems to work.

If I'm trying to read the element itunes:summary (it will be en object of type XML::RSS::Parser::Element), for example, I've tried:

$summary = $element->query('itunes:summary'); $summary = $element->query('{http://www.itunes.com/dtds/podcast-1.0.dt +d}summary'); $summary = $element->query('summary') $summary = $element->query(process_name('itunes:summary'): $summary = $element->query(process_name('{http://www.itunes.com/dtds/p +odcast-1.0.dtd}summary'):
and some other things I don't remember. But everything I try, it ether returns undef or I get told Bad call to match: '<XXX>' contains unknown token.

Does anyone know how this works? Thanks.

Replies are listed 'Best First'.
Re: namespaces in XML::RSS::Parser::Element
by choroba (Cardinal) on Apr 11, 2024 at 23:25 UTC
    Namespace URIs are case sensitive. The namespace your file uses is different to the namespace in the module documentation:
    file: http://www.itunes.com/dtds/podcast-1.0.dtd doc: http://www.itunes.com/DTDs/Podcast-1.0.dtd

    You can register a new prefix:

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use XML::RSS::Parser; my $p = XML::RSS::Parser->new; $p->register_ns_prefix('lc_itunes', 'http://www.itunes.com/dtds/podcast-1.0.dtd'); my $feed = $p->parse_file(*DATA); for my $item ($feed->query('//item')) { say $item->query('lc_itunes:summary')->text_content; } __DATA__ <rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:content="content" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Title</title> <link>file:///</link> <description/> <dc:language>cs_CZ</dc:language> </channel> <item> <guid isPermaLink="false">5460304c-e284-4f84-b4a3-2cccba31fb81</ +guid> ...

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you! I don't think I'd ever have noticed that without the pointer.
Re: namespaces in XML::RSS::Parser::Element
by choroba (Cardinal) on Apr 11, 2024 at 16:05 UTC
    Can you also paste an example of the RSS you're trying to process?

    Update: Also, what is $element? Even the synopsis shows you should use XPath for queries, maybe you should involve some kind of axis, like //itunes:summary.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      $element is being generated by foreach my $element ($rss->query('//item')) to grab individual items from within the RSS feed.

      I've never worked with XPath before, but based on the Wikipedia page it looks like my $summary = $element->query('//itunes:summary'); ought to work, but it's still returning undef.