young_stu has asked for the wisdom of the Perl Monks concerning the following question:

I'm having problems with HTML::TokeParser when I use strict. When I try to access the value of an attribute of a start tag, as in:
use strict; use HTML::TokeParser my $filepath = "c:/folder/file.html"; my $stream = HTML::TokeParser -> new($filepath); while (my $token = $stream -> get_token()){ print "PDF link!\n" if $token -> [2] -> {'href'} =~ m/\.pdf/; }
I get the following error message: "Can't use string ("") as a HASH ref while "strict refs" in use at line 6" The problem goes away when I don't use strict. But I want to use strict.
Thanks!

Replies are listed 'Best First'.
Re: use strict and TokeParser
by Joost (Canon) on Nov 23, 2004 at 23:52 UTC
    From the docs:
    $p->get_token This method will return the next token found in the HTML do +cument, or "undef" at the end of the document. The token is returned as an array ref +erence. The first element of the array will be a string denoting the type of this tok +en: "S" for start tag, "E" for end tag, "T" for text, "C" for comment, "D" for declara +tion, and "PI" for process instructions. The rest of the token array depend on the ty +pe like this: ["S", $tag, $attr, $attrseq, $text] ["E", $tag, $text] ["T", $text, $is_data] ["C", $text] ["D", $text] ["PI", $token0, $text]

    It appears you're trying to read the href attribute from a token without attributes (like an end tag or a text). Also not all html tags actually have an href attribute, which should give warnings if you've enabled them.

    How about (untested):

    while (my $token = $stream -> get_token()){ if ($token->[0] eq 'S') { # start tag if (exists $token->[2]->{href}) { # tag has href attribute print "PDF link!\n" if $token -> [2] -> {'href'} =~ m/\.pdf/; } } }

    updated:

    Your code "works" without strict because using $something->{href}, where $something is a string will reference a global hash named $something, creating it if it doesn't exist yet (i.e. if $something eq 'blah', a global hash %blah will be created if it doesn't already exists). This can cause all kinds of mayhem and is a good reason to always use strict (see strict 'refs' in the strict documentation and symbolic links in perlref)

    updated: moved doc links to cpan.org, perldoc.com is messing up again.

Re: use strict and TokeParser
by dave_the_m (Monsignor) on Nov 23, 2004 at 23:50 UTC
    use HTML::TokeParser
    There isn't a semicolon at the end of this line. I don't know whether that's causing your problem or not.

    Dave.

Re: use strict and TokeParser
by Ovid (Cardinal) on Nov 24, 2004 at 03:02 UTC

    You can make this easier and avoid those types of errors by switching to HTML::TokeParser::Simple and some defensive programming (it's easier to read, too.)

    use strict; use HTML::TokeParser::Simple; my $filepath = "c:/folder/file.html"; my $stream = HTML::TokeParser::Simple->new($filepath) or die $!; while (my $token = $stream->get_token){ next unless $token->is_start_tag('a'); print "PDF link!\n" if $token->get_attr('href') =~ m/\.pdf/; }

    Cheers,
    Ovid

    New address of my CGI Course.

      I would strongly reccomend against HTML::TokeParser::Simple. While initially it looks very useful, it is very unstable and it breaks too frequently for my taste, for example the straw that broke the camels back. Instead of trying to figure out what was broken in HTML::TokeParser::Simple, I simply went back to using HTML::TokeParser. Nothing to debug, nothing to worry about anymore.

      MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
      I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
      ** The third rule of perl club is a statement of fact: pod is sexy.

        So you send me a private message telling me there are, in fact, problems with my module but you refuse to tell me what the problems are. If you can give me some clear, specific issues with HTML::TokeParser::Simple that need to be resolved, I'd be happy to deal with those issues. The truth is, the only bug report I can ever remember getting about this module was from you and that was a couple of years ago. Even then it was not a bug but a bad design decision on my part and I fixed it rather promptly.

        So please tell me what the problems are with this module. If you can't think of any, don't go spreading FUD. If you can think of some, why didn't you tell me what they are before you started trashing my module? I didn't write the thing just to say I have something on the CPAN. I want it to actually be useful.

        Update: After digging around, I find a cryptic entry in your changes file for HTML::LinkExtractor. All it says is that you stopped using HTML::TokeParser::Simple back in September of this year. There's no mention in the associated bug report that my module was at fault and had you even done something as simple as forward me the link, I would have happily fixed it. Now I'm going on vacation and (if the code is in fact buggy) I have to leave buggy code on the CPAN until I get back.

        Cheers,
        Ovid

        New address of my CGI Course.