AngusScrimm has asked for the wisdom of the Perl Monks concerning the following question:
Newbie here. I'm trying to get the data from just the <title> tag of an HTML page.
I have some Perl code (cobbled together from some online examples) that can read the data from an HTML file, and I have an example code snippet that is supposed to read just the <title> tag.
My problem is figuring out how to make the two pieces of code work together. Or maybe I'm going down the wrong path. Any advice would be appreciated.
Here's the code to read in all the data from the HTML file:
#!/usr/bin/perl -w use strict; package Example; require HTML::Parser; @Example::ISA = qw(HTML::Parser); my $parser = Example->new; $parser->parse_file('index2.html'); print $parser->{TEXT}; sub text { my ($self,$text) = @_; $self->{TEXT} .= $text; }
And here's the code snippet, listed on the CPAN page for HTML::Parser, for extracting just the <title> tag data:
sub start_handler { return if shift ne "title"; my $self = shift; $self->handler(text => sub { print shift }, "dtext"); $self->handler(end => sub { shift->eof if shift eq "title"; }, "tagname,self"); } my $p = HTML::Parser->new(api_version => 3); $p->handler( start => \&start_handler, "tagname,self"); $p->parse_file(shift || die) || die $!; print "\n";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: read HTML <title> tag
by Corion (Patriarch) on May 31, 2005 at 13:50 UTC | |
|
Re: read HTML <title> tag
by dbwiz (Curate) on May 31, 2005 at 14:06 UTC | |
|
Re: read HTML <title> tag
by jeffa (Bishop) on May 31, 2005 at 14:19 UTC | |
| A reply falls below the community's threshold of quality. You may see it by logging in. |