Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I'm working on an application that is parsing information from a web page - specifically tables on said page. All is working just fine until I push the data on to my HoA. The data appears correct in the data structure - I've watched it running under the Tk Debugger. As far as I can tell from what I've read, if I read correctly, Perl thinks that some of the data isn't defined in my anon hash. I realize I can "ignore" this by turning off warnings but I'd rather get to the bottom of the problem before I turn off warnings.

As my code shows I'm setting values to a physical space if there's a html space on the table cell.

My code is as follows:
sub process_info{ my ($temp, $year, $info); do {($token = $stream->get_token)} until $token->[0] eq "C" and $t +oken->[1] =~ /1958/; while($token = $stream->get_token) { if ($token->[0] eq "S" and ${$token->[2]}{year}){ $year = ${ +$token->[2]}{year} } if ($token->[0] eq "S"){ if (${$token->[2]}{id}){ $temp = &get_token("td"); my @names = split(" ",$temp); $info->{last} = pop(@names); $info->{first} = join( " ", @names ); $temp = &get_token("td"); if ($temp eq " "){ $info->{addr} = ''; } else{ $info->{addr} = $temp; } $temp = &get_token("br"); if ($temp eq " "){ ($info->{city}, $info->{state}, $info->{zip}) + = ('', '', ''); } else{ my $t2; ($info->{city}, $t2) = split(', ',$temp); ($info->{state},$t2) = split(" ",$t2); $t2 =~ s/\s+//; $info->{zip} = $t2; } $temp = &get_token("td"); if ($temp eq " "){ $info->{phone} = ''; } else{ $info->{phone} = $temp; } $temp = &get_token("a"); if ($temp eq " "){ $info->{email} = ''; } else{ $info->{email} = $temp; } push( @{$bros->{$year}}, $info); $info = {}; # reset $info hash } # end ${$token->[2]}{id} } # end $token->[0] eq "S" } # end while } # end sub

And a sample of the HTML I'm parsing:

<!-- Start 1958 --> <TR id="start"> <TD>Rusty Bartel</TD> <TD>&nbsp;<BR>&nbsp;</TD> <TD>&nbsp;</TD> <TD><A HREF="">&nbsp;</A></TD> </TR> <TR id="start"> <TD>Charles Brown</TD> <TD>123 Main Ave<BR>Sebastopol, CA 95472</TD> <TD>&nbsp;</TD> <TD><A HREF="">&nbsp;</A></TD> </TR> <TR id="start"> <TD>Ardon Milkes</TD> <TD>345 George Dr<BR>Springfield, VA 22152</TD> <TD(503) 555-1212</TD> <TD><A HREF="mailto:me@nobody.com">me@nobody.com</A></TD> </TR>

I've even tried putting in a '?' so I wouldn't have a physical space if that was what was throwing the warning, but that didn't work either. As far as I can tell, all the data in the anon hash is filled before I push it.

Can anybody shed any light on this?

TIA!

There is no emoticon for what I'm feeling now.
  • Comment on Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings
  • Select or Download Code

Replies are listed 'Best First'.
Re: Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings (style issues)
by grinder (Bishop) on Jul 05, 2004 at 06:48 UTC
    Can anybody shed any light on this?

    Not directly, but there are a couple of issues that leap out at me.

    if ($token->[0] eq "S" and ${$token->[2]}{year}){ $year = ${$token-> +[2]}{year} } if ($token->[0] eq "S"){

    Those two statements are actually at the same level. Despite what the indentation says, the latter is not subordinate to the former. Bring the second back to the same indentation level as the first, or, better yet, rearrange the code to say $token->[0] eq "S" only once.

    The variable $temp raises an immediate Red Flag. Name it $token instead, because that is what it appears to be.

    Declare your variables as late as possible. By pulling $temp/$token into the inner loop, and above all, $info, you don't need the make-work $info = {}, it will be automatically reset each time through the loop.

    Drop the & on the function calls.

    The following code:

    $temp = &get_token("a"); if ($temp eq "&nbsp;"){ $info->{email} = ''; } else{ $info->{email} = $temp; }

    ... could greatly benefit from the ternary ? : operator:

    $token = get_token("a"); $info->{email} = $token eq "&nbsp;" ? '' : $token ; # update: had a mish-mash of temp/token in this snippet

    ... not because using the ternary operator is cool, or for some other bogus style reason, but because it means that you only say $info->{email} once, which means one less chance of typing or editing a key incorrectly and thereby creating two keys in the hash.

    The following code, um, needs attention:

    my $t2; ($info->{city}, $t2) = split(', ',$temp); ($info->{state},$t2) = split(" ",$t2); $t2 =~ s/\s+//; $info->{zip} = $t2;

    It looks like an abuse of split (if you have is a hammer...) A regex can do the same in a single statement, with no need for the intermediate variable (which has a horribly unhelpful name anyway). I'm not sure about the following, but something like this will work:

    @{$info}{qw{ city state zip}} = ($token =~ /^([^,])+,\s+(\S+)\s+(\d+)$/) ? ( $1, $2, $3 ) : ( '', '', '' ) ;

    Now that's definitely fancy stuff, but it's just taking the previous points into consideration and throwing in a hash-slice assignment. And with a bit of creative whitespace we can make things line up and the code documents itself quite nicely.

    - another intruder with the mooring of the heat of the Perl

      Thanks for the tips on the ternary. I had tried that but obviously I had done something wrong and went with the if-else instead. I prefer the ternary as well when I can as I think it's easier to read.

      I see your point as far as $token->[0] eq "S". I'm trying to see if the S token has an id="year" in it. If it didn't have the id="year" in it, I wanted it to drop through, but I see now how I can rewrite that.

      As far as the abuse of split, yes that was going to get reworked. :) This is really the first time I've dealt with a complex data structure so that was my primary objective.

      There is no emoticon for what I'm feeling now.
Re: Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings
by mrpeabody (Friar) on Jul 05, 2004 at 05:25 UTC
    I'm assuming this statement is where the error is occurring, since it's the only push() in your code.

    push( @{$bros->{$year}}, $info);

    There are three variables there, and Perl is telling you that one of them is undefined. You should be able to tell which one by looking at the code, but if not some judiciously placed print() statements will help. (Hint: it's the one that only gets initialized in one branch of a conditional.)

Re: Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings
by ysth (Canon) on Jul 05, 2004 at 04:54 UTC
    Does:
    use Data::Dumper; print Dumper $bros;
    look right?
      Thanks! That did the trick. It was a simple case of I was not setting the $year key in my original hash since I was starting to read too far in to the file. I set my entry point lower than where year gets set because I got sick of having to step through a bunch of html code until I got to the point where the information was that I wanted.

      There is no emoticon for what I'm feeling now.
Re: Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings
by neeraj (Scribe) on Jul 05, 2004 at 05:34 UTC
    I don't see any problem in this subroutine. Why don't you check sub get_token,may be problem lies in that. Also use Data::Dumper to check if hashes of array is created properly.
Re: Pushing anon hash on to HoA structure giving Use of uninitialized value in hash element error using warnings
by water (Deacon) on Jul 05, 2004 at 20:06 UTC
    Also, you might look at HTML::TableExtract to pull the tables from the HTML before transforming the tables into your data structure.