in reply to Extract numbers.....
The link extractor $html_page =~ /(<a^>+>)/g is not safe for all purposes. But it is a bit more robust than Chady's quick-and-dirty grabber which assumes that the complete text of only one tag will appear on any one line. From the looks of it, it's likely that the one-line proviso is fine for your data.#!/usr/bin/perl -w use strict; # Dummy up some data my $html_page = <<END_HTML; <html> <head><title>My test page</title></head> <body> Stuff stuff stuff <a href="/f1/show?page=matchup&lid=206&week=8&mid1=2&mid2=4"> Grut garble glump <a href="/f1/show?page=matchup&lid=206&week=13&mid1=11&mid2=8"> Anderanda manda ander <a href="/f1/show?page=matchup&lid=206&week=4&mid1=7&mid2=7"> Bottom </body> </html> END_HTML # if you KNOW the bits will always be in the order above... while ( $html_page =~ /&week=(\d+)&mid1=(\d+)&mid2=(\d+)/g ) { my ($week, $mid1, $mid2) = ($1, $2, $3); print "week[$week] mid1[$mid1] mid2[$mid2]\n"; # do stuff } # if the bits can occur in any order... while ( $html_page =~ /(<a[^>]+>)/g ) { my $anchor_txt = $1; my ($week, $mid1, $mid2); my $found_all = 1; unless ( ($week) = $anchor_txt =~ /week=(\d+)/ ) {$found_all = 0} unless ( ($mid1) = $anchor_txt =~ /mid1=(\d+)/ ) {$found_all = 0} unless ( ($mid2) = $anchor_txt =~ /mid2=(\d+)/ ) {$found_all = 0} if ($found_all) { print "week[$week] mid1[$mid1] mid2[$mid2]\n"; # do stuff } else { print "Oops! Missing bit or extraneous tag.\n"; } }
If the solutions proposed so far are not flexible enough to handle your data, give another holler...
Hope this helps. David
|
|---|