sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

Using HTML::TokeParser I am trying to rip all meta tag information. Problem is, I need the parse each one into its own scalar so I can later print it.

The example they gave was
use HTML::TokeParser; $p = HTML::TokeParser->new(shift||"index.html"); if ($p->get_tag("title")) { my $title = $p->get_trimmed_text; print "Title: $title\n"; }
I suppose I could write if ($p->get_tag("meta")) { and IF that was the proper meta call, it would parse all the tags it finds. Can someone show me a way to call each tag separately and store it in it's own string for later use?

I am sure you all know what meta tags look like, but if not here are two examples:
<meta name="copyright" content="Aaron Anderson"> <meta name="keywords" content="free, cheap, fun">


Thanks so much, the DOCS on HTML::TokeParser don't cover this.

sulfericacid

Replies are listed 'Best First'.
Re: HTML::TokeParser- Meta Tags
by broquaint (Abbot) on Nov 21, 2002 at 10:44 UTC
    Have you checked out Ovid's HTML::TokeParser::Simple as it sounds like it would be well suited for this job e.g
    use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->(shift || "index.html"); my @meta_tags; while($t = $p->get_token()) { push @meta_tags, $t if $t->is_tag('meta'); }

    HTH

    _________
    broquaint

Re: HTML::TokeParser- Meta Tags
by LTjake (Prior) on Nov 21, 2002 at 12:20 UTC
    Here's a little example using HTML::Tokeparser. It's not generic enough to grab ANY meta tag attributes, but it'll grab all name-content pairs like you have in your examples.
    use HTML::TokeParser; local $/; my $content = <DATA>; my $p = HTML::TokeParser->new(\$content); my %meta; while (my $token = $p->get_token) { next if $token->[1] ne 'meta' && $token->[0] ne 'S'; $meta{$token->[2]{name}} = $token->[2]{content}; } print "$_: $meta{$_}\n" foreach (keys %meta); __DATA__ <meta name="copyright" content="Aaron Anderson"> <meta name="keywords" content="free, cheap, fun">
    gives me:
    keywords: free, cheap, fun copyright: Aaron Anderson
    Update: Took out lc() from around $token->[1], since all tags are ensured to be lowercase -- so says PodMaster! =)

    --
    Rock is dead. Long live paper and scissors!