hermes has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to make an HTML file out of an RSS file. The RSS file contains items of the type:
<item> <title>Davies promises 2004 best year</title> <link>http://www.worldpress.org/feed.cfmhttp://www.jamaica-gleaner.com +:80/gleaner/20040326/lead/lead1.html</link> <description>Jamaica Gleaner, Centrist daily of Kingston, Jamaica</description> </item>
As you can see, I need to extract the Headline (<title></title>), the URL (<link></link>) and the description (<description></description>) from each such item group. The following piece of code does not work. Could you please direct me as to how to extract the above 3 items properly? The program dos not read any of the items in this fashion.
@@@@@@@@@ while($maryb=~ /<title>(.*?)<\/title><link>(.*?)<\/link><description>(.*?)<\/descript +ion>/) { $headline=$1; $url=$2; $desc=$3; print FILE <<"UP_TO_HERE"; <title>$headline</title> <a href=\"$url\"> $headline</a><br> <font size=\"-1\"><b>$desc</b></font><br><br> UP_TO_HERE } @@@@@@@@@
Thanks a million!

maryb

Edit by tye, add CODE and P tags

Edit2 by Chady -- retitled from 'A Very Simple Question'

Replies are listed 'Best First'.
Re: Converting RSS file to HTML
by Hero Zzyzzx (Curate) on Mar 27, 2004 at 04:21 UTC

    Don't do this manually. Someone has already done the work for you and put it on www.cpan.org in the form of XML::RSS.

    #!/usr/bin/perl #Almost exactly from the docs. use XML::RSS; my $rss=XML::RSS->new(); $rss->parse($your_rss_feed_in_this_scalar); foreach my $item (@{$rss->{'items'}}) { print "title: $item->{'title'}\n"; print "link: $item->{'link'}\n\n"; print "description: $item->{'description'}\n\n"; }

    I'll leave the HTML bit as an exercise to the reader. . .

    -Any sufficiently advanced technology is
    indistinguishable from doubletalk.

    My Biz

Re: Converting RSS file to HTML
by Vautrin (Hermit) on Mar 27, 2004 at 16:54 UTC

    RSS feeds are a form of XML. If you are unfamiliar with XML, you should read up on it. You may be able to use XSLT or XML Schema to automatically do what you want to do for you. And, you should use some kind of XML parser. Also, check out XML::Simple. You can use it to load the XML File into a combination of arrays and hashes. Then you could print out your HTML like so:

    use strict; use warnings; use XML::Simple; my $parser = XML::Simple->new; my $hash = $parser->XMLin("./file"); # I'm not quite certain what data structure # will appear. Use Data::Dump or Data::Dumper # to find out. # note that custom quotes is new, and will break # under early versions of Perl print qq| <title>$hash->{item}->[0]->{title}</title> <a href="$hash->{item}=>[0]->{link}> $hash->{item}->[0]->{description}</a> |; # note that you could easily do this instead: # my @items = $hash->{item}; # foreach my $item (@items) { # # print stuff # }

    Multiple XML tags which are the same will be in an array. So if you have more than one items you can do this, but be careful though. XML::Simple is supposed to be simple and easy to use, and thus doesn't do any kind of validation, and doesn't necessarily preserve the order of the XML. Go to CPAN and get a copy of another XML parser if you want to do something more complex. Oh, and never roll your own parser unless it's for educational purposes. It's not a good idea as there are plenty of parsers out there which include more work and effort then you would probably want to, and thus work much better.


    Want to support the EFF and FSF by buying cool stuff? Click here.
Re: Converting RSS file to HTML
by ajt (Prior) on Mar 27, 2004 at 16:53 UTC

    The XML::RSS (Perl-RSS) module is a good place to start. There are some examples on how to use this module on perl.com and others are linked on the module's home page.

    If you know XSLT, then you can convert RSS feeds into HTML that way too, XML::LibXSLT is a good module for this, and there is also XML::RSS::Tools which is a wrapper around XML::RSS, XML::LibXSLT and LWP.


    --
    ajt