Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, any experts at perl able to help here?

http://www.perl.com/pub/a/2001/11/15/creatingrss.html

Apparently shows how to create your own RSS file for the BBC News headlines - however, it doesn't seem to work.

Ideally, what I want, is to do something like that, but for the following page

http://cpfc.org/news/docs/jsnews.txt

Is perl the best way to go, or could php do the same thing quite easily? I'm no coding expert, so any help would be great.

Cheers
James

update (broquaint): title change (was HTML 2 RSS)

Replies are listed 'Best First'.
Re: Converting HTML to RSS
by jasonk (Parson) on Feb 17, 2003 at 14:04 UTC

    Why don't you explain what you tried and how it didn't work, then maybe someone can help you, otherwise we have to just guess what it is that went wrong.

Re: Converting HTML to RSS
by Anonymous Monk on Feb 17, 2003 at 14:49 UTC
    Here is what I have in the perl script.

    http://www.exwebjunkie.com/testperl.txt

    It create's the .rss file, but it just contains the following:

    <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://my.netscape.com/rdf/simple/0.9/"> <channel> <title>cpfc.org</title> <link>http://www.cpfc.org/</link> <description>cpfc.org - news links.</description> </channel> </rdf:RDF>
    So it seems that it is not finding the articles or converting them..

    update (broquaint): added formatting

      The URL you're fetching the content from returns a big java-script if I open it in mozilla or wget.
      I don't think LWP::Simple interpretes js, correct me if I'm wrong, and just disregard this message if I'm saying something stupid.

      The rest of the script contains errors I'm sure of you can spot 'em all by yourself, if your looking long enough. (using $headline as an undefined variable for example, not looping in some way over the content, etc.).



      Update:Maybe you shouldn't use HTML::TokeParser but try to manufacture a regexp and parse the content by yourself
      something like m!<a href="([a-zA-Z0-9/?&=]+)">([^<]+)!, which will most likely capture you the url and the title if slightly modified.

      regards,
      tomte


Re: Converting HTML to RSS
by osama (Scribe) on Feb 17, 2003 at 14:29 UTC
    I would suggest stripping the javascript extras with something like:
    $text=~s/document.write\('(.+?)'\);/$1/g;
    this would replace:
    document.write('***ANYTHING***'); 
    with the text between the single quotes:
    ***ANYTHING***
    for the data in jsnews.txt , this will create a valid html snippet...