comment on

Just by way of closing this post, and giving back a bit so that others in my position can search for it (and find the answer without bothering the murmuring monks):

#!/usr/bin/perl

use strict;
use warnings;
no warnings 'uninitialized';
use HTML::TokeParser::Simple;

my $p = HTML::TokeParser::Simple->new($ARGV[0]);
my $tag = '';

while (my $tkn = $p->get_token)
{
    if ($tkn->is_start_tag('title')) { $tag = "TITLE"; }

    elsif ($tkn->is_start_tag('h1')) { $tag = "H1"; }

    elsif ($tkn->is_start_tag('h2')) { $tag = "H2"; }

    elsif ($tkn->is_start_tag('h3')) { $tag = "H3"; }

    elsif ($tkn->is_start_tag('h4')) { $tag = "H4"; }

    elsif ($tkn->is_start_tag('h5')) { $tag = "H5"; }

    elsif ($tkn->is_start_tag('h6')) { $tag = "H6"; }

    elsif ($tkn->is_start_tag('b')) { $tag = "B"; }

    elsif ($tkn->is_start_tag('i')) { $tag = "I"; }

    elsif ($tkn->is_start_tag('u')) { $tag = "U"; }

    elsif ($tkn->is_start_tag('a')) { $tag = "A"; }

    elsif ($tkn->is_start_tag('img')) { $tag = "IMG"; }

    elsif ($tkn->is_start_tag('meta')) { $tag = "META"; }

    elsif (
        $tkn->is_end_tag('title') ||
        $tkn->is_end_tag('h1') ||
        $tkn->is_end_tag('h2') ||
        $tkn->is_end_tag('h3') ||
        $tkn->is_end_tag('h4') ||
        $tkn->is_end_tag('h5') ||
        $tkn->is_end_tag('h6') ||
        $tkn->is_end_tag('b')  ||
        $tkn->is_end_tag('i')  ||
        $tkn->is_end_tag('u')  ||
        $tkn->is_end_tag('a')  ||
        $tkn->is_end_tag('img') ||
        $tkn->is_end_tag('meta')
        ) { $tag = ''; }

    elsif ($tkn->is_text() && $tag && $tag ne 'META')
    {
        print "TAG: $tag, VALUE: ".$tkn->as_is . "\n";
    }

    if ($tag eq 'IMG' && $tkn->get_attr('alt'))
    {
        print "TAG: $tag, ALT VALUE: ". $tkn->get_attr('alt') . "\n";
    }

    if ($tag eq 'META' &&
        $tkn->get_attr('name') eq 'keywords' &&
        $tkn->get_attr('content'))
    {
        print "META-TAG: $tag, KEYWORDS VALUE: ". $tkn->get_attr('cont
+ent') . "\n";
    }

    if ($tag eq 'META' &&
        $tkn->get_attr('name') eq 'description' &&
        $tkn->get_attr('content'))
    {
        print "META-TAG: $tag, DESCRIPTION VALUE: ". $tkn->get_attr('c
+ontent') . "\n";
    }
}
[download]

In reply to Re^2: More efficient use of HTML::TokeParser::Simple by henka
in thread More efficient use of HTML::TokeParser::Simple by henka

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.