$p->get_token This method will return the next token found in the HTML do +cument, or "undef" at the end of the document. The token is returned as an array ref +erence. The first element of the array will be a string denoting the type of this tok +en: "S" for start tag, "E" for end tag, "T" for text, "C" for comment, "D" for declara +tion, and "PI" for process instructions. The rest of the token array depend on the ty +pe like this: ["S", $tag, $attr, $attrseq, $text] ["E", $tag, $text] ["T", $text, $is_data] ["C", $text] ["D", $text] ["PI", $token0, $text]
It appears you're trying to read the href attribute from a token without attributes (like an end tag or a text). Also not all html tags actually have an href attribute, which should give warnings if you've enabled them.
How about (untested):
while (my $token = $stream -> get_token()){ if ($token->[0] eq 'S') { # start tag if (exists $token->[2]->{href}) { # tag has href attribute print "PDF link!\n" if $token -> [2] -> {'href'} =~ m/\.pdf/; } } }
updated:
Your code "works" without strict because using $something->{href}, where $something is a string will reference a global hash named $something, creating it if it doesn't exist yet (i.e. if $something eq 'blah', a global hash %blah will be created if it doesn't already exists). This can cause all kinds of mayhem and is a good reason to always use strict (see strict 'refs' in the strict documentation and symbolic links in perlref)
updated: moved doc links to cpan.org, perldoc.com is messing up again.
In reply to Re: use strict and TokeParser
by Joost
in thread use strict and TokeParser
by young_stu
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |