Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks! How's it hanging? I'm trying to write a spider that scans user profiles and yields specific data based on their user id numeric values. However, when I create an array with HTML:Parser and the corresponding tokens I only receive the data from the first one even though the arrays are incrementing. Any ideas?
while ($count<=21151){ #create output file if ($append<1) { open(MYOUTFILE, ">newtest2.out"); #open for write, overwrite } else { open(MYOUTFILE, ">>newtest2.out"); #open for write, append } print MYOUTFILE "\n"; #write newline #url is my page $url2[$c2] = "http://www.mysite.com/index.cfm?fuseaction.showme&user= +$count"; #get the data $data2[$c2] = get($url2[$c2]) or die $!; #parse var with data $p2[$c2] = HTML::TokeParser->new(#$data2[$c2]); #get title information from main table while ($token2[$c2] = $p2[$c2]->get_tag("table")) { next unless defined($token2[$c2]->[1]{width}); next unless $token2[$c2]->[1]{width} == "435"; $p2[$c2]->get_tag("td"); $p2[$c2]->get_tag("span");$p2[$c2]->get_ +tag("span");$p2[$c2]->get_tag("\span"); $title2[$c2] = $p2[$c2]->get_trimmed_text; $title[$c1]=$title2[$c2]; $c1++; ... $c2++ $count++;
etc.

2006-03-09 Retitled by planetscape, as per Monastery guidelines
Original title: 'HTML:Parser'

Replies are listed 'Best First'.
Re: Accessing elements in array returned by HTML::TokeParser
by davidrw (Prior) on Mar 08, 2006 at 20:38 UTC
    a few notes/questions/comments...
    • why are you opening the out file inside the while loop? why not open once at beginning?
    • Something like open MYOUTFILE, ($append?'>>':'>'), "newttest2.out"; prevents duplicate code and folds those 8 lines into 1..
    • what is $c2? why isn't it the same thing as $count?
    • This line shouldn't even compile: $p2[$c2] = HTML::TokeParser->new(#$data2[$c2]); (the '#' in there is a comment)
    • are you using use strict; and use warnings; ?
    • where are the relevant variables declared/how are the initialized?
    • what's $c1 used for (versus say $c2)?
      -i attempted the while loop inside with the same results
      -thanks for the shorthand
      -i used count to set a range: eg: 2100 to 2110
      -the # was a typo, not in the program
      -yes
      -the variables are declared above the while loop and are declared as follows:
      my @title2 = ""; my @price; my @title; my @url2; my @data2; my @token2; my @p2; my @release; my $c1=0; my $c2=0; my $c3=0; my $count=21101; my $count2=0; my $append = 0;
      -$c1 is used for multiple instances of results based on the html search for text data. For example when parsing information based on the tags set $c1 may yield several (in my case 4) different text outputs, I want the 2nd of the 4, which is working.
        Thanks for the help. I figured it out. The title array instead of creating a new instance of itself each time just piled the data on top of one another. When modified to $title[$c2*2] everything came out all right. Thanks again.

        Edited by planetscape - added code tags so square brackets would not linkify