There are two main ways to do it:
1. Use some sort of HTML or DOM parsing module, have it pick out the element of the page containing your date, and then go up the chain of parents until you get something that contains all the info you want. Looking at the page in your link, the date would be in a <span> tag, which is inside a <p>, which is inside a <td>, which is inside a <tr>, which appears to contain the info you want. So you'd have to point to the correct span and then get its parent's parent's parent, and either parse the text out of that or plug that <tr> into a <table> of your own. Exactly how to do that process will depend on what module you use. With something like Mojo::DOM, it could look something like this ('ve barely used it, but it looks a lot like jQuery which I'm familiar with, so I think this is close):
for my $e ($dom->find('span')->each) {
if($e->text =~ /$mydate/ ){
my $myhtml = $e->parent->parent->parent->text;
# do stuff with $myhtml
}
}
2. Parse the data from the raw HTML with your own regular expressions. See example below. Regexes like this tend to be tricky to create and brittle, because they're liable to break as soon as the page design changes at all. (So will a DOM/parser method if the nesting of the elements changes, but a regex may break just because they start capitalizing a tag.) But for a quick-and-dirty hack that you're using for your own use, it gets the job done.
#!/usr/bin/env perl
use Modern::Perl;
use LWP::Simple;
my $date = $ARGV[0] || '4/23/012';
my $page =
get('http://staweb.sta.cathedral.org/departments/math/mhansen/public_h
+tml/1112hcal/1112hcal.htm');
die "Couldn't get page" unless $page;
my( $assignment ) = $page =~ m{ $date .+? <span .+?>(.+?)</span>\s*</p
+> }sx;
say $assignment;
Aaron B.
Available for small or large Perl jobs; see my home node.
|