comment on

Newbie here. I'm trying to get the data from just the <title> tag of an HTML page.

I have some Perl code (cobbled together from some online examples) that can read the data from an HTML file, and I have an example code snippet that is supposed to read just the <title> tag.

My problem is figuring out how to make the two pieces of code work together. Or maybe I'm going down the wrong path. Any advice would be appreciated.

Here's the code to read in all the data from the HTML file:

 #!/usr/bin/perl -w

use strict;
package Example;
require HTML::Parser;

@Example::ISA = qw(HTML::Parser);

my $parser = Example->new;
$parser->parse_file('index2.html');
print $parser->{TEXT};

sub text
{
  my ($self,$text) = @_;
  $self->{TEXT} .= $text;
}
[download]

And here's the code snippet, listed on the CPAN page for HTML::Parser, for extracting just the <title> tag data:

sub start_handler
  {
    return if shift ne "title";
    my $self = shift;
    $self->handler(text => sub { print shift }, "dtext");
    $self->handler(end  => sub { shift->eof if shift eq "title"; },
                           "tagname,self");
  }

  my $p = HTML::Parser->new(api_version => 3);
  $p->handler( start => \&start_handler, "tagname,self");
  $p->parse_file(shift || die) || die $!;
  print "\n";
[download]

In reply to read HTML <title> tag by AngusScrimm

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.