Amoe has asked for the wisdom of the Perl Monks concerning the following question:

Someone here must be familiar with the Diaryland system. It's one of those weblog thing. Well, I was writing a script to have it shown to me when I startup, and thought that despite me being a novice it should be fairly easy. I've run into a hitch, though.
use strict; use warnings; use diagnostics; use HTML::TokeParser; use LWP::Simple; print &parse_diaryland('username', 'password'); sub parse_diaryland { my($username, $password); my($diary_url); my($parsee_man); my($html, $tag); my($body); $username = shift(); $password = shift(); $diary_url = "http://$username:$password\@$username.diaryland.com" +; # urls are in the form http://username:password@username.diaryla +nd.com $html = get($diary_url); print $html; $parsee_man = HTML::TokeParser->new(\$html); foreach $tag ($parsee_man->get_tag('TD')) { if ($tag->[1]{'align'} eq 'left' && $tag->[1]{'vAlign'} eq 'to +p') # fails here { my($secondtag); $secondtag = $parsee_man->get_tag(); if ($secondtag->[0] eq 'FONT') { $body = get_text('/FONT'); last(); # got the text body so quit loop } } } return($body); }
and we're matching this:
<TD align=left vAlign=top><FONT face="Verdana, Arial, Helvetica, sans- +serif" size=2>I This is the text body we wanna grab. foo, bar and angst. </FONT>
That fails with "use of uninitialised variable in string eq at line 30". I can vaguely make sense of this: the page we're getting has some TD tags earlier, and they lack the align and valign attributes, so that would be undefined. How can I fix this, though?

Replies are listed 'Best First'.
Re: Diaryland parsing
by japhy (Canon) on Jul 07, 2001 at 15:37 UTC
    Honestly? Just check for an non-empty value as well:
    if ($tag->[1]{align} and $tag->[1]{align} eq 'left' and ...) { # do stuff }


    japhy -- Perl and Regex Hacker
      That's what I thought. Oh, and does it matter if you put align in quotes or not?
        I compiled a brief summary of what's allowed under even 'strict' at Re: using quotes in hash keys, if you're curious. It gives you an idea of how the rules work with respect to hash keys, and hash list definitions using the arrow ('=>') operator.
      Thanks ;-) Notice any other flaws in my logic, btw?