Re: Best way to search for specifics in a webpage?

Personally I would use HTML::TableExtract which is a subclass of HTML::Parser for everything. If you want a fragile regex solution you could do (assuming html page is in the scalar $html):

my ($location)   = $html =~ m/Location:\s*([^<]+)/i;
my ($time)       = $html =~ m/Time:\s*([^<]+)/i;
my ($days)       = $html =~ m/Days:\s*([^<]+)/i;
my ($instructor) = $html =~ m/Instructor:\s*([^<]+)/i;

# if you want plain text you will need to do this
$location   = unescapeHTML($location);
$time       = unescapeHTML($time);
$day        = unescapeHTML($days);
$instructor = unescapeHTML($instructor);

# this unescapes common cases, not all possible cases. For perfection 
+-> CPAN
sub unescapeHTML {
    my( $unescape ) = @_;
  return undef unless defined($unescape);
    $unescape=~ s[&(.*?);]{
        local $_ = $1;
        /^amp$/i           ? '&' :
        /^quot$/i          ? '"' :
        /^gt$/i            ? '>' :
        /^lt$/i            ? '<' :
        /^nbsp/i           ? ' ' :
        /^#(\d+)$/         ? chr($1) :
        /^#x([0-9a-f]+)$/i ? chr(hex($1)) :
        $_
    }gex;
  return $unescape;
}
[download]

If you use arrays rather than scalars for location et al and add a /g you will get all the locations on the page...

my @location   = $html =~ m/Location:\s*([^<]+)/gi;
# first match will be in $loction[0] and last match (no suprisingly) i
+n $location[-1]
[download]

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Comment on Re: Best way to search for specifics in a webpage? Select or Download Code

Replies are listed 'Best First'.
Re: Re: Best way to search for specifics in a webpage? by Anonymous Monk on Dec 27, 2002 at 11:10 UTC
tachyon, Thanks! Totally appreciate it ..... :-D. Quick general question -- how do you guys know so much about Perl? Is it experience or hobby or .....? Would love to be able to solve problems and be comfortable with the language. Thanks again, Surya	[reply]
Re: Re: Re: Best way to search for specifics in a webpage? by tachyon (Chancellor) on Dec 27, 2002 at 11:36 UTC
In term of languages Perl has a lot to offer for people who need to get the job done - besides the enormous power of the language itself you have CPAN and the community. CPAN is the best resource of free high quality library functions for any language IMHO. The modules on CPAN are as ecclectic as Perl itself and cover almost anything you can think of doing. Perlmonks is one of the best support forums for any language and you have others like comp.lang.perl.misc if you like newsgroups and don't mind the odd flame. By trade I am a doctor of medicine but have been running an IT company and doing systems admin for quite a while now. Programming for 25 years now, and almost exclusively in Perl for the last 3. As with anything the more you do the better you get. The beauty of Perl is that (with modules) you can get amazing results very early on, with the community there to help you with problems. BTW New Monks is probably worth a read (especially the bit about how to ask questions) as is the CGI Help Guide and A Guide to Installing Modules and Use strict, warnings and diagnostics or die tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply]