Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Parsing for Jazz

by YAFZ (Pilgrim)
on Jun 21, 2003 at 18:12 UTC ( [id://267840]=perlquestion: print w/replies, xml ) Need Help??

YAFZ has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm trying to write a very primitive parser to grab the news from a couple of jazz websites. My aim is to keep this Perl script as simple and optimized as possible. I plan to use its output at different sites (also related to music). I'd be glad if you can comment on my code and tell me if there are better ways to do it so that I can stop and think about it before I advance any further. Here is my primitive jazz parser:
#!/usr/bin/perl -w use strict; use LWP::Simple; my %URL = ('jazztimes', 'http://jazztimes.com/JazzNews/JazzNews.asp', 'allaboutjazz', 'http://allaboutjazz.com'); my %pattern = ('jazztimes', '<a href="http://jazztimes\.com/JazzNews/J +azzNews\.asp\?cmd=view&articleid=\d+">[^<]+</a>', 'allaboutjazz', '/news/ft/2003.*?</a>'); my $data = get $URL{'jazztimes'}; print "Content-type: text/html\n\n"; print "JazzTimes.com:<br>"; while ($data =~ m!$pattern{'jazztimes'}!ig) { print "$&<br>"; } print "<br>"; print "allaboutjazz.com:<br>"; $data = get $URL{'allaboutjazz'}; while ($data =~ m!$pattern{'allaboutjazz'}!ig) { print "<a href=http://allaboutjazz.com" . "$&<br>"; }
Note: The working script can be viewed online at http://ileriseviye.org/cgi-bin/jazzparse.pl

Replies are listed 'Best First'.
Re: Parsing for Jazz
by fglock (Vicar) on Jun 21, 2003 at 19:14 UTC

    Take a look at Cache::Cache. It will help you not to repeat the 'get' too frequently.

      Are you reading my mind? ;-) That was one of the features I thought to implement as soon as possible. I planned to store the parsed data in a file and then check for the date of the file if older than one day then get the data from remote site otherwise read from file, etc.

      Cache::Cache seems to implement the kind of functionality I've mentioned above but if it is not a core module I think I'll prefer to implement it myself because I don't want to force my hosting company loading this module only for my small script ;-)

      By the way, I hope the URL downloading and parsing with RegExp part is ok and opimized...
Re: Parsing for Jazz
by Cody Pendant (Prior) on Jun 22, 2003 at 00:08 UTC
    Non-code related, but if you're planning to use other people's headlines widely, then you may run into copyright problems. I'm sure you know that but some people might overlook it.
    --
    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D
      Actually, first of all this script is planned for internal use but even if I print the headlines on some other music site the original links will be kept so that people clicking on them will go to the related sites and I think this prevents a problem about copyright (even though I'm not sure ;-).
        I think this prevents a problem about copyright (even though I'm not sure ;-).

        I think you're 100% wrong about that. It's hardly likely to be an issue in this case, but just because you link back to the site doesn't mean you're OK, definitely not.

        I don't want to be boring about it, it's just that I've recently been on a course about these issues which opened my eyes to stuff and everywhere I go now, I see Lawsuits Waiting To Happen... or at least Frosty Cease And Desist Letters Waiting To Be Sent.
        --

        “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
        M-J D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://267840]
Approved by TStanley
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-03-28 08:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found