The link extractor $html_page =~ /(<a^>+>)/g is not safe for all purposes. But it is a bit more robust than Chady's quick-and-dirty grabber which assumes that the complete text of only one tag will appear on any one line. From the looks of it, it's likely that the one-line proviso is fine for your data.#!/usr/bin/perl -w use strict; # Dummy up some data my $html_page = <<END_HTML; <html> <head><title>My test page</title></head> <body> Stuff stuff stuff <a href="/f1/show?page=matchup&lid=206&week=8&mid1=2&mid2=4"> Grut garble glump <a href="/f1/show?page=matchup&lid=206&week=13&mid1=11&mid2=8"> Anderanda manda ander <a href="/f1/show?page=matchup&lid=206&week=4&mid1=7&mid2=7"> Bottom </body> </html> END_HTML # if you KNOW the bits will always be in the order above... while ( $html_page =~ /&week=(\d+)&mid1=(\d+)&mid2=(\d+)/g ) { my ($week, $mid1, $mid2) = ($1, $2, $3); print "week[$week] mid1[$mid1] mid2[$mid2]\n"; # do stuff } # if the bits can occur in any order... while ( $html_page =~ /(<a[^>]+>)/g ) { my $anchor_txt = $1; my ($week, $mid1, $mid2); my $found_all = 1; unless ( ($week) = $anchor_txt =~ /week=(\d+)/ ) {$found_all = 0} unless ( ($mid1) = $anchor_txt =~ /mid1=(\d+)/ ) {$found_all = 0} unless ( ($mid2) = $anchor_txt =~ /mid2=(\d+)/ ) {$found_all = 0} if ($found_all) { print "week[$week] mid1[$mid1] mid2[$mid2]\n"; # do stuff } else { print "Oops! Missing bit or extraneous tag.\n"; } }
If the solutions proposed so far are not flexible enough to handle your data, give another holler...
Hope this helps. David
In reply to Re: Extract numbers.....
by dvergin
in thread Extract numbers.....
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |