I think I see now. You're pushing the links for each item onto an array and then reducing to the set of new links. Once there, you want to get back to the $rss->item that the link came from.
You still have $rss so you can find the item by searching. Maybe something like
sub find_item() { my $link = shift; for my $item ( $rss->{items} ) { $item->{link} eq $link and return $item; } return undef; }
On reflection, I don't really care for the way you're keeping track of seen items. All that map grep stuff can be replaced with a simple hash. Maybe you'll have more luck if you restructure things a bit. Here's my skeletal example.
#!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS::LibXML; my $url = 'http://www.usnews.com/rss/health-news/index.rss'; my $rss_file = '/tmp/.rss_download_file'; my $website_name = "usnews"; my %seen; my $client = LWP::UserAgent->new; my $rss = XML::RSS::LibXML->new; while ( 1 ) { print "polling: $website_name url: $url\n"; $client->mirror($url, $rss_file); # be nice to the server $rss->parsefile($rss_file) or die $!; if ( !%seen ) { print "first listing\n"; } foreach my $item ( @{ $rss->{items} } ) { $seen{ $item->{link} }++ and next; # already saw this item # do stuff with the new item print $item->{title}, "\n"; print "$item->{pubDate}\n"; #$client->get() ... } sleep 15 * 60; # 15 minutes, play nice }
As an aside, fetching the RSS file every second is a good way to convince the server that you're attacking it. 15 minutes is probably okay but you should check the Terms of Service to be sure. On that same note, I like LWP::UserAgent's mirror method because it sends the "If-Modified-Since" header so you don't fetch the file if it hasn't changed.
In reply to Re^5: Parsing ITEM Tag from RSS feed using XML::RSS::LibXML
by rowdog
in thread Parsing ITEM Tag from RSS feed using XML::RSS::LibXML
by mr_p
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |