RSS Feed content provider for Perl Mongers

I recently was looking for ideas to improve the Barcelona Perl Mongers web page and came across an item in the Perl Mongers FAQ about a couple of RSS feeds which could be used to provide dynamic content to the page. This script does that-- it currently is being used at http://barcelona.pm.org/news.cgi, slightly modified.

The script does not attempt to fix bad RSS feeds-- first off because the feeds I'm using seem to be okay, and secondly because I don't like the idea of fixing them (I mean, that would defeat the purpose of the RSS, no?

This script is written without HERE tags, but rather relying on the CGI.pm module.

Note: Depending on what version of XML::RSS you have on your server, you may need to take off the Carp code for this to work-- I also had to take off Taint (-T on the first line).

#!/usr/bin/perl -w -T
# rss_news.pl - Read from RSS Feeds and produce
# a web page.
# Note: This is not guaranteed to work on all
# RSS feeds.  It does not break on the ones listed.
# See this at work at
# http://barcelona.pm.org/news.cgi

use strict;

use XML::RSS;
use LWP::Simple qw(get);
use CGI qw(:standard escapeHTML);
# warningsToBrowser may not work in your version--
# if so, use comment this declaration and use the
# next one. Also, remove the fatalsToBrowser for a
# production site if you're worried about
# security.
use CGI::Carp qw(fatalsToBrowser warningsToBrowser);
#use CGI::Carp qw(fatalsToBrowser);

my $cgi = new CGI; # Hash reference
print $cgi->header()."\n\n";
# If warningsToBrowser doesn't work in your version,
# comment the next line.
warningsToBrowser(1);
print $cgi->start_html("news feeds")."\n\n";

# Read and process the channels listed at the
# end of the file.
my @channels;
while (<DATA>)
{
    push(@channels, $_);
}
map {get_channel($_, $cgi)} (@channels);

print $cgi->end_html()."\n";

sub get_channel
{
    # This subroutine takes a URL and a CGI object
    # and prints out the resulting RSS feed formatted
    # as HTML

    my ($url, $q) = @_;
    my $version = "1.0";

    # Use LWP::Simple to get the RSS feed.
    my $content = get($url) or carp("I couldn't get from $url");
    # Just to be sure...
    if ($content =~ m/^\s*$/)
    {
        carp("$url is empty");
        return;
    }

    # Determine the version based on regexes-- 
    # I based these on the feeds I was familiar
    # with, so this part could be improved by
    # finding the standard for this information
    if ($content =~ m#http://my.netscape.com/rdf/simple/0.9/#i)
    {
        $version = "0.9";
        print $q->comment("$url: RDF version 0.9")."\n";
    }
    elsif ($content =~ m#<rss version="0.91">#i)
    {
        $version = "0.91";
        print $q->comment("$url: RDF version 0.91")."\n";
    }
    else
    {
        print $q->comment("$url: looks like RDF version 1.0")."\n";
    }
    # Create the new rss object with this version
    my $result = new XML::RSS (version => "$version");

    # Parse the content
    $result->parse($content) or carp("I couldn't parse $url");

    # Channel Title
    my $channel = $result->{'channel'}; # Hash reference
    print $q->comment("This page was generated from $url.")."\n\n";
    print $q->h1("News from ".
        $q->a( { -href => $channel->{'link'}},
        $channel->{'title'}))."\n";
    print $q->h2($channel->{'description'})."\n";

    # Channel Image
    my $image = $result->{'image'}; # Hash reference
    if ($image->{'title'} ne "")
    {
        print $q->a( { -href => $image->{'link'}},
            img( { -src => $image->{'url'},
            -alt => $image->{'title'}}))."\n\n";
    }

    # Channel Items
    my $tcontent = "";
    my $items = $result->{'items'}; # Array reference
    map {
        $tcontent .= $q->Tr(
            $q->td(
                $q->a(
                    { -href => $_->{'link'}},
                    $_->{'title'}
                )
            )
        )."\n"
    } (@{$items});
    print $q->table($tcontent)."\n\n";

    # Channel Search Form
    my $textinput = $result->{'textinput'}; # Hash Reference
    if ($textinput->{'link'} ne "")
    {
        print $q->h2($textinput->{'description'})."\n";
        print $q->start_form( {-action => $textinput->{'link'}} )."\n"
+;
        print $q->input( {-name => $textinput->{'name'}} )."\n";
        print $q->input( {-type => "submit",
                -value => $textinput->{'title'} || "Search"})."\n";
        print $q->end_form()."\n\n";
    }
}
__DATA__
http://use.perl.org/useperl.rdf
http://search.cpan.org/recent.rdf
http://www.xml.com/cs/xml/query/q/19
http://www.perlmonks.org/headlines.rdf
[download]

--
Zeno - Barcelona Perl Mongers http://barcelona.pm.org
http://www.javajunkies.org

Comment on RSS Feed content provider for Perl Mongers Download Code

Replies are listed 'Best First'.
•Re: RSS Feed content provider for Perl Mongers by merlyn (Sage) on Mar 30, 2003 at 22:10 UTC
For a different take on this, I did a "email me the new headlines" newsfeed watcher for one of my colums -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: •Re: RSS Feed content provider for Perl Mongers by zeno (Friar) on Mar 31, 2003 at 08:25 UTC
Thanks for your response. Your program has some notable improvements over mine: you use LWP::Simple's `mirror` instead of `get`. This is much more directly suited to the job. One weakness of my code is that it attempts to download the RDF feed every time a user visits the page, whereas I could use `mirror` to inexpensively check for updates and only download the new RDF feed if there are modifications. you also use LWP::Simple's `is_success` to check the success of the RDF feed download. This tool is directly applicable to the problem of not being able to receive the RDF feed, whereas mine is a bit jury-rigged. Now my only problem is finding RDF feeds in Catalan. Thanks again. I will incorporate your improvements into my program. -- Zeno - Barcelona Perl Mongers http://barcelona.pm.org http://www.javajunkies.org	[reply] [d/l] [select]
Re: Re: •Re: RSS Feed content provider for Perl Mongers by maksl (Pilgrim) on Apr 08, 2003 at 22:13 UTC
were can i get your new version?? /me is very curious?	[reply]
Re: RSS Feed content provider for Perl Mongers by crouchingpenguin (Priest) on Mar 31, 2003 at 15:01 UTC
I have done something similar with a cron job that checks a database table for sites to cache. It then fills a second table with the cached content. I don't know how useful it is, but I had a good time doing it and maybe it will be helpful/interesting to you. It is broken down into three parts: db layout caching script a frontend rss generator for rss links The rss frontend called via a browser outputs an rss feed describing the cached rss sites and content and a link to each (rss serving rss links... I always thought it would have an interesting application... but I've yet to really use it). The DB layout +---------+------------------------+---------------------------------- +-------------------------+ \| Column \| Type \| Modifiers + \| +---------+------------------------+---------------------------------- +-------------------------+ \| sid \| integer \| not null default nextval('public. +rsssites_sid_seq'::text) \| \| title \| character varying(255) \| not null + \| \| url \| character varying(255) \| not null + \| \| active \| boolean \| + \| \| baseurl \| character varying(255) \| + \| +---------+------------------------+---------------------------------- +-------------------------+ Indexes: rsssites_pkey primary key btree (sid), rsssites_sid_key unique btree (sid) +--------+------------------------+-----------+ \| Column \| Type \| Modifiers \| +--------+------------------------+-----------+ \| cid \| integer \| \| \| title \| character varying(255) \| not null \| \| url \| character varying(255) \| not null \| +--------+------------------------+-----------+ Foreign Key constraints: rss_site FOREIGN KEY (cid) REFERENCES rsssite +s(sid) ON UPDATE CASCADE ON DELETE CASCADE [download] The Caching script #!/usr/bin/perl use strict; use warnings; ###################################################################### +####### # Takes rss files from across the internet and sticks them into the da +tabase. # Best ran from cron ###################################################################### +####### if( -f '/var/run/retrieve_rss.pid'){ system('kill -9 `cat /var/run/retrieve_rss.pid`'); system('rm /var/run/retrieve_rss.pid'); } open(PIDFILE,'>/var/run/retrieve_rss.pid'); print PIDFILE $$,"\n"; close(PIDFILE); my $DEBUG = defined $ARGV[0] ? $ARGV[0] : 0; ### initial setup use LWP::Simple; use XML::RSS; use DBI; my $dbh = DBI->connect('DBI:Pg:dbname=DBNAME','DBUSER','DBPASS',{AutoC +ommit => '0'}); my $sth = $dbh->prepare('select sid,url from rsssites where active is +true'); $sth->execute(); while(my ($site_id,$site_url) = $sth->fetchrow_array()){ eval 'get_links($site_id,$site_url)'; print $@,"\n" if $@; } ### disconnect $sth->finish(); $dbh->disconnect; ### done unlink '/var/run/retrieve_rss.pid'; 1; ### this is for rss sites sub get_links { my ($id,$url) = @_; ### DEBUG print "Getting links for $url\n" if $DEBUG; my $document = get($url) \|\| return; # clean the string (this fixes some broken rss) $document =~ s/\015\012?/\012/g \|\| 1; $document =~ s/&(?!(?:[a-zA-Z0-9]+\|#\d+);)/&/g \|\| 1; # parse a string my $rss = new XML::RSS(Style => 'Debug') \|\| return; $rss->parse($document) \|\| return; # clear out the db, check for a failure, rollback, and move on... unless( clear_db($id) ){ $dbh->rollback; return; } foreach my $item (@{$rss->{'items'}}) { my ($title,$link); $title = $item->{'title'}; $link = $item->{'link'}; chomp($title,$link); ### remove unsightly site specific links next if ($title =~ /Customize this feed/i); ### stick it into the database my $sth = $dbh->prepare('insert into rsscontent (cid,title,url +) values (?,?,?)'); $sth->execute($id,$title,$link); $sth->finish(); # check to see if an error has been raised and rollback if tru +e if($dbh->errstr){ $dbh->rollback; print "Rolling back line [$title][$link]: $dbh->errstr\n" +if $DEBUG; return; } } # check to see if an error has been raised... # if so, rollback, if not, commit unless($dbh->errstr){ print "Committing for $url\n" if $DEBUG; $dbh->commit; }else{ print "Rolling back $url: $dbh->errstr\n" if $DEBUG; $dbh->rollback; } return; } sub clear_db { my ($sid) = @_; if( defined($sid) ){ $dbh->do("delete from rsscontent where cid = $sid"); unless($DBI::errstr){ print "Successfully cleared content for $sid\n" if $DEBUG; return 1; }else{ print "Failed to clear content for $sid: $DBI::errstr\n" i +f $DEBUG; return 0; } } return 0; } [download] RSS frontend #!/usr/bin/perl use strict; use warnings; use DBI; use XML::RSS; use CGI qw(:standard); my $cgi = new CGI(); my $dbh = DBI->connect('DBI:Pg:dbname=DBNAME','DBUSER','DBPASS'); unless( $dbh ){ print $cgi->header(),$cgi->start_html('Oops'),$cgi->h1('We have a +problem'),$cgi->end_html(); exit; } my $rss = new XML::RSS(); if(defined( $cgi->param('site') ) ){ my $site_data = $dbh->selectrow_hashref('select title,baseurl from + rsssites where sid = ' . $cgi->param('site') ); $rss->channel( title => $site_data->{title}, link => $site_data->{baseurl}, description => $site_data->{title} ); my $query = 'select title,url from rsscontent where cid = ?'; my $sth = $dbh->prepare($query); $sth->execute( $cgi->param('site') ); while (my ($title,$link) = $sth->fetchrow_array()){ $title = $cgi->escapeHTML($title); $rss->add_item( title => $title, link => $link, ); } }else{ ### no site param $rss->channel( title => 'RSS caching system', link => 'http://www.localhost/cgi-bin/rss', description => 'RSS interface to cached news', ); $rss->image( title => 'Localhost', url => 'http://www.localhost/images/favicon.png', link => 'http://www.localhost', ); $rss->textinput( title => 'Localhost search', description => 'Use the text input below to search Localhost', name => 'search_term', link => 'http://www.localhost/search' ); my $query = 'select sid,title from rsssites'; my $sth = $dbh->prepare($query); $sth->execute(); while (my ($sid,$title) = $sth->fetchrow_array()){ $rss->add_item( title => $title, link => "http://www.localhost/cgi-bin/rss?site=$sid" ); } } $dbh->disconnect(); print $cgi->header( -type=>'text/xml' ); print $rss->as_string; 1; [download] cp ---- "Never be afraid to try something new. Remember, amateurs built the ark. Professionals built the Titanic."	[reply] [d/l] [select]


"be consistent"
	PerlMonks