Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

RSS Feed content provider for Perl Mongers

by zeno (Friar)
on Mar 30, 2003 at 21:51 UTC ( [id://246792]=CUFP: print w/replies, xml ) Need Help??

I recently was looking for ideas to improve the Barcelona Perl Mongers web page and came across an item in the Perl Mongers FAQ about a couple of RSS feeds which could be used to provide dynamic content to the page. This script does that-- it currently is being used at http://barcelona.pm.org/news.cgi, slightly modified.

The script does not attempt to fix bad RSS feeds-- first off because the feeds I'm using seem to be okay, and secondly because I don't like the idea of fixing them (I mean, that would defeat the purpose of the RSS, no?

This script is written without HERE tags, but rather relying on the CGI.pm module.

Note: Depending on what version of XML::RSS you have on your server, you may need to take off the Carp code for this to work-- I also had to take off Taint (-T on the first line).

#!/usr/bin/perl -w -T # rss_news.pl - Read from RSS Feeds and produce # a web page. # Note: This is not guaranteed to work on all # RSS feeds. It does not break on the ones listed. # See this at work at # http://barcelona.pm.org/news.cgi use strict; use XML::RSS; use LWP::Simple qw(get); use CGI qw(:standard escapeHTML); # warningsToBrowser may not work in your version-- # if so, use comment this declaration and use the # next one. Also, remove the fatalsToBrowser for a # production site if you're worried about # security. use CGI::Carp qw(fatalsToBrowser warningsToBrowser); #use CGI::Carp qw(fatalsToBrowser); my $cgi = new CGI; # Hash reference print $cgi->header()."\n\n"; # If warningsToBrowser doesn't work in your version, # comment the next line. warningsToBrowser(1); print $cgi->start_html("news feeds")."\n\n"; # Read and process the channels listed at the # end of the file. my @channels; while (<DATA>) { push(@channels, $_); } map {get_channel($_, $cgi)} (@channels); print $cgi->end_html()."\n"; sub get_channel { # This subroutine takes a URL and a CGI object # and prints out the resulting RSS feed formatted # as HTML my ($url, $q) = @_; my $version = "1.0"; # Use LWP::Simple to get the RSS feed. my $content = get($url) or carp("I couldn't get from $url"); # Just to be sure... if ($content =~ m/^\s*$/) { carp("$url is empty"); return; } # Determine the version based on regexes-- # I based these on the feeds I was familiar # with, so this part could be improved by # finding the standard for this information if ($content =~ m#http://my.netscape.com/rdf/simple/0.9/#i) { $version = "0.9"; print $q->comment("$url: RDF version 0.9")."\n"; } elsif ($content =~ m#<rss version="0.91">#i) { $version = "0.91"; print $q->comment("$url: RDF version 0.91")."\n"; } else { print $q->comment("$url: looks like RDF version 1.0")."\n"; } # Create the new rss object with this version my $result = new XML::RSS (version => "$version"); # Parse the content $result->parse($content) or carp("I couldn't parse $url"); # Channel Title my $channel = $result->{'channel'}; # Hash reference print $q->comment("This page was generated from $url.")."\n\n"; print $q->h1("News from ". $q->a( { -href => $channel->{'link'}}, $channel->{'title'}))."\n"; print $q->h2($channel->{'description'})."\n"; # Channel Image my $image = $result->{'image'}; # Hash reference if ($image->{'title'} ne "") { print $q->a( { -href => $image->{'link'}}, img( { -src => $image->{'url'}, -alt => $image->{'title'}}))."\n\n"; } # Channel Items my $tcontent = ""; my $items = $result->{'items'}; # Array reference map { $tcontent .= $q->Tr( $q->td( $q->a( { -href => $_->{'link'}}, $_->{'title'} ) ) )."\n" } (@{$items}); print $q->table($tcontent)."\n\n"; # Channel Search Form my $textinput = $result->{'textinput'}; # Hash Reference if ($textinput->{'link'} ne "") { print $q->h2($textinput->{'description'})."\n"; print $q->start_form( {-action => $textinput->{'link'}} )."\n" +; print $q->input( {-name => $textinput->{'name'}} )."\n"; print $q->input( {-type => "submit", -value => $textinput->{'title'} || "Search"})."\n"; print $q->end_form()."\n\n"; } } __DATA__ http://use.perl.org/useperl.rdf http://search.cpan.org/recent.rdf http://www.xml.com/cs/xml/query/q/19 http://www.perlmonks.org/headlines.rdf
--
Zeno - Barcelona Perl Mongers http://barcelona.pm.org
http://www.javajunkies.org

Replies are listed 'Best First'.
•Re: RSS Feed content provider for Perl Mongers
by merlyn (Sage) on Mar 30, 2003 at 22:10 UTC

      Thanks for your response. Your program has some notable improvements over mine:

      1. you use LWP::Simple's mirror instead of get. This is much more directly suited to the job. One weakness of my code is that it attempts to download the RDF feed every time a user visits the page, whereas I could use mirror to inexpensively check for updates and only download the new RDF feed if there are modifications.
      2. you also use LWP::Simple's is_success to check the success of the RDF feed download. This tool is directly applicable to the problem of not being able to receive the RDF feed, whereas mine is a bit jury-rigged.

      Now my only problem is finding RDF feeds in Catalan.

      Thanks again. I will incorporate your improvements into my program.

      --
      Zeno - Barcelona Perl Mongers http://barcelona.pm.org
      http://www.javajunkies.org
        were can i get your new version??
        /me is very curious?
Re: RSS Feed content provider for Perl Mongers
by crouchingpenguin (Priest) on Mar 31, 2003 at 15:01 UTC

    I have done something similar with a cron job that checks a database table for sites to cache. It then fills a second table with the cached content. I don't know how useful it is, but I had a good time doing it and maybe it will be helpful/interesting to you.

    It is broken down into three parts:

    • db layout
    • caching script
    • a frontend rss generator for rss links
    The rss frontend called via a browser outputs an rss feed describing the cached rss sites and content and a link to each (rss serving rss links... I always thought it would have an interesting application... but I've yet to really use it).

    The DB layout

    +---------+------------------------+---------------------------------- +-------------------------+ | Column | Type | Modifiers + | +---------+------------------------+---------------------------------- +-------------------------+ | sid | integer | not null default nextval('public. +rsssites_sid_seq'::text) | | title | character varying(255) | not null + | | url | character varying(255) | not null + | | active | boolean | + | | baseurl | character varying(255) | + | +---------+------------------------+---------------------------------- +-------------------------+ Indexes: rsssites_pkey primary key btree (sid), rsssites_sid_key unique btree (sid) +--------+------------------------+-----------+ | Column | Type | Modifiers | +--------+------------------------+-----------+ | cid | integer | | | title | character varying(255) | not null | | url | character varying(255) | not null | +--------+------------------------+-----------+ Foreign Key constraints: rss_site FOREIGN KEY (cid) REFERENCES rsssite +s(sid) ON UPDATE CASCADE ON DELETE CASCADE

    The Caching script

    #!/usr/bin/perl use strict; use warnings; ###################################################################### +####### # Takes rss files from across the internet and sticks them into the da +tabase. # Best ran from cron ###################################################################### +####### if( -f '/var/run/retrieve_rss.pid'){ system('kill -9 `cat /var/run/retrieve_rss.pid`'); system('rm /var/run/retrieve_rss.pid'); } open(PIDFILE,'>/var/run/retrieve_rss.pid'); print PIDFILE $$,"\n"; close(PIDFILE); my $DEBUG = defined $ARGV[0] ? $ARGV[0] : 0; ### initial setup use LWP::Simple; use XML::RSS; use DBI; my $dbh = DBI->connect('DBI:Pg:dbname=DBNAME','DBUSER','DBPASS',{AutoC +ommit => '0'}); my $sth = $dbh->prepare('select sid,url from rsssites where active is +true'); $sth->execute(); while(my ($site_id,$site_url) = $sth->fetchrow_array()){ eval 'get_links($site_id,$site_url)'; print $@,"\n" if $@; } ### disconnect $sth->finish(); $dbh->disconnect; ### done unlink '/var/run/retrieve_rss.pid'; 1; ### this is for rss sites sub get_links { my ($id,$url) = @_; ### DEBUG print "Getting links for $url\n" if $DEBUG; my $document = get($url) || return; # clean the string (this fixes some broken rss) $document =~ s/\015\012?/\012/g || 1; $document =~ s/&(?!(?:[a-zA-Z0-9]+|#\d+);)/&amp;/g || 1; # parse a string my $rss = new XML::RSS(Style => 'Debug') || return; $rss->parse($document) || return; # clear out the db, check for a failure, rollback, and move on... unless( clear_db($id) ){ $dbh->rollback; return; } foreach my $item (@{$rss->{'items'}}) { my ($title,$link); $title = $item->{'title'}; $link = $item->{'link'}; chomp($title,$link); ### remove unsightly site specific links next if ($title =~ /Customize this feed/i); ### stick it into the database my $sth = $dbh->prepare('insert into rsscontent (cid,title,url +) values (?,?,?)'); $sth->execute($id,$title,$link); $sth->finish(); # check to see if an error has been raised and rollback if tru +e if($dbh->errstr){ $dbh->rollback; print "Rolling back line [$title][$link]: $dbh->errstr\n" +if $DEBUG; return; } } # check to see if an error has been raised... # if so, rollback, if not, commit unless($dbh->errstr){ print "Committing for $url\n" if $DEBUG; $dbh->commit; }else{ print "Rolling back $url: $dbh->errstr\n" if $DEBUG; $dbh->rollback; } return; } sub clear_db { my ($sid) = @_; if( defined($sid) ){ $dbh->do("delete from rsscontent where cid = $sid"); unless($DBI::errstr){ print "Successfully cleared content for $sid\n" if $DEBUG; return 1; }else{ print "Failed to clear content for $sid: $DBI::errstr\n" i +f $DEBUG; return 0; } } return 0; }

    RSS frontend

    #!/usr/bin/perl use strict; use warnings; use DBI; use XML::RSS; use CGI qw(:standard); my $cgi = new CGI(); my $dbh = DBI->connect('DBI:Pg:dbname=DBNAME','DBUSER','DBPASS'); unless( $dbh ){ print $cgi->header(),$cgi->start_html('Oops'),$cgi->h1('We have a +problem'),$cgi->end_html(); exit; } my $rss = new XML::RSS(); if(defined( $cgi->param('site') ) ){ my $site_data = $dbh->selectrow_hashref('select title,baseurl from + rsssites where sid = ' . $cgi->param('site') ); $rss->channel( title => $site_data->{title}, link => $site_data->{baseurl}, description => $site_data->{title} ); my $query = 'select title,url from rsscontent where cid = ?'; my $sth = $dbh->prepare($query); $sth->execute( $cgi->param('site') ); while (my ($title,$link) = $sth->fetchrow_array()){ $title = $cgi->escapeHTML($title); $rss->add_item( title => $title, link => $link, ); } }else{ ### no site param $rss->channel( title => 'RSS caching system', link => 'http://www.localhost/cgi-bin/rss', description => 'RSS interface to cached news', ); $rss->image( title => 'Localhost', url => 'http://www.localhost/images/favicon.png', link => 'http://www.localhost', ); $rss->textinput( title => 'Localhost search', description => 'Use the text input below to search Localhost', name => 'search_term', link => 'http://www.localhost/search' ); my $query = 'select sid,title from rsssites'; my $sth = $dbh->prepare($query); $sth->execute(); while (my ($sid,$title) = $sth->fetchrow_array()){ $rss->add_item( title => $title, link => "http://www.localhost/cgi-bin/rss?site=$sid" ); } } $dbh->disconnect(); print $cgi->header( -type=>'text/xml' ); print $rss->as_string; 1;

    cp
    ----
    "Never be afraid to try something new. Remember, amateurs built the ark. Professionals built the Titanic."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://246792]
Approved by hiseldl
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-18 00:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found