There's a lot you don't say about how much variation might
occur in your data. But here's some code:
#!/usr/bin/perl -w
use strict;
# Dummy up some data
my $html_page = <<END_HTML;
<html>
<head><title>My test page</title></head>
<body>
Stuff stuff stuff
<a href="/f1/show?page=matchup&lid=206&week=8&mid1=2&mid2=4">
Grut garble glump
<a href="/f1/show?page=matchup&lid=206&week=13&mid1=11&mid2=8">
Anderanda manda ander
<a href="/f1/show?page=matchup&lid=206&week=4&mid1=7&mid2=7">
Bottom
</body>
</html>
END_HTML
# if you KNOW the bits will always be in the order above...
while ( $html_page =~ /&week=(\d+)&mid1=(\d+)&mid2=(\d+)/g ) {
my ($week, $mid1, $mid2) = ($1, $2, $3);
print "week[$week] mid1[$mid1] mid2[$mid2]\n";
# do stuff
}
# if the bits can occur in any order...
while ( $html_page =~ /(<a[^>]+>)/g ) {
my $anchor_txt = $1;
my ($week, $mid1, $mid2);
my $found_all = 1;
unless ( ($week) = $anchor_txt =~ /week=(\d+)/ ) {$found_all = 0}
unless ( ($mid1) = $anchor_txt =~ /mid1=(\d+)/ ) {$found_all = 0}
unless ( ($mid2) = $anchor_txt =~ /mid2=(\d+)/ ) {$found_all = 0}
if ($found_all) {
print "week[$week] mid1[$mid1] mid2[$mid2]\n";
# do stuff
}
else {
print "Oops! Missing bit or extraneous tag.\n";
}
}
The link extractor
$html_page =~ /(<a^>+>)/g
is not safe for all purposes.
But it is a bit more robust than Chady's quick-and-dirty
grabber which assumes that the complete text of only one
tag will appear on any one line.
From the looks of it, it's likely that the
one-line proviso is fine for your data.
If the solutions proposed so far are not flexible enough
to handle your data, give another holler...
Hope this helps. David |