Dear Monks,
I'm trying to write a very primitive parser to grab the news from a couple of jazz websites. My aim is to keep this Perl script as simple and optimized as possible. I plan to use its output at different sites (also related to music). I'd be glad if you can comment on my code and tell me if there are better ways to do it so that I can stop and think about it before I advance any further.
Here is my primitive jazz parser:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my %URL = ('jazztimes', 'http://jazztimes.com/JazzNews/JazzNews.asp',
'allaboutjazz', 'http://allaboutjazz.com');
my %pattern = ('jazztimes', '<a href="http://jazztimes\.com/JazzNews/J
+azzNews\.asp\?cmd=view&articleid=\d+">[^<]+</a>',
'allaboutjazz', '/news/ft/2003.*?</a>');
my $data = get $URL{'jazztimes'};
print "Content-type: text/html\n\n";
print "JazzTimes.com:<br>";
while ($data =~ m!$pattern{'jazztimes'}!ig) {
print "$&<br>";
}
print "<br>";
print "allaboutjazz.com:<br>";
$data = get $URL{'allaboutjazz'};
while ($data =~ m!$pattern{'allaboutjazz'}!ig) {
print "<a href=http://allaboutjazz.com" . "$&<br>";
}
Note: The working script can be viewed online at
http://ileriseviye.org/cgi-bin/jazzparse.pl