Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Ideally, I want to collect all meta tags and store each of them into a hash with the meta name as the key. I'm already using TokeParser for scraping, so please don't suggest I also use TokeParser::Simple. I read through the docs and can't seem to find any information on what I am looking for.my %meta; my $htm2 = HTML::TokeParser->new( \$src ); while (my $token = $htm2->get_token) { next if $token->[1] ne 'meta' && $token->[0] ne 'S'; $meta{$token->[2]{name}} = $token->[2]{content}; }
Also, if a modified version of the code above works, can you explain line for line what it's doing? I'm having trouble piecing things together.
My last question is this. Can I extract different parts of an HTML document with TokeParser in one run? Or must I run them all separately?
I can extract the title tag just fine, but only when I make a new reference to TokeParser. It seems like a waste of resources to call the module AGAIN when the html dump is still in memory, right? Or does the data change after each time you loop over tokens?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
What's wrong with HTML::TokeParser::Simple?
by Ovid (Cardinal) on Mar 20, 2006 at 23:34 UTC | |
|
Re: meta tag extraction with TokeParser
by Thelonius (Priest) on Mar 21, 2006 at 00:18 UTC | |
|
Re: meta tag extraction with TokeParser
by saintmike (Vicar) on Mar 20, 2006 at 20:59 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:44 UTC | |
by saintmike (Vicar) on Mar 20, 2006 at 21:47 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:51 UTC | |
by Anonymous Monk on Mar 20, 2006 at 21:54 UTC |