in reply to Re: Which loop should I use?
in thread Which loop should I use?

Can you tell me what's wrong with
my $start = 1; my $end = 10; my $cnt = 0; for ($start..$end) { $cnt++; print "<b>Current page count: $cnt</b><p><p>"; my $funky = "http://www.allpoetry.com/chat/page=$cnt;"; my $content = get($funky); my $tree = HTML::Tree->new(); $tree->parse($content); # retrieve the text and split into lines my @lines = split "<br>", $tree->as_text; local $/; my @good_lines; my $good_lines; for my $lines (@lines) { $lines =~ s/\)/\)<br>/g; while($lines =~ m/Next Chatter \>(.*?)\< Previous Chatter/gs){ $good_lines = $1; push @good_lines,$good_lines; } foreach (@good_lines){ my @lines = split /<br>/; foreach (@lines){ next unless $_; #m/^(.*?):(.*?) \((.*) (?:seconds|minute|minutes|hour|hours|day +|days) ago\)$/; #m/([^:]+): (.+)\((.*)\)/; m/([^:]+): (.+)(\((.*)\))/; my( $name, $text, $delay ) = ( $1, $2, $3 ); #print "NAME:$name\nText:$text\nDelay:$delay\n\n"; #if ($name =~ m/Elisabeth/) { print "$name $text $delay<br>"; # $text $delay<br>" #} } } } }
? I'm doing what you showed as an example and I get:
sprkls926 hello (17 minutes ago) sprkls926 i don't feel good (17 minutes ago) kinkygoddess hiya ppls (17 minutes ago) sprkls926 *faints and slids off the roof* (17 minutes ago) sprkls926 *watches the blood follow outta her* (18 minutes ago) Shinigami why sad ? (18 minutes ago) Shinigami why sad ? (18 minutes ago) sprkls926 sad (19 minutes ago) sprkls926 *crys and cuts her neck* (19 minutes ago) Shinigami so howz everyone ? (19 minutes ago) FiXato sure sw...just comment on mine and I will get back to one of yo +urs right away (20 minutes ago) sidewinder hmmm.....not that I know of... (20 minutes ago) sprkls926 sidewinder is there anyway to revires this vampire thing* + (21 minutes ago) Shinigami hello everyone (22 minutes ago) sidewinder No one wants to trade? (22 minutes ago) vampira1665 morning sidewinder, I will read some of ur stuff if there +is any I haven't read (22 minutes ago) sidewinder no wants to trade??? (22 minutes ago) sprkls926 *starts to claw at her neck* (24 minutes ago) sprkls926 *backs away from everyone* (25 minutes ago) sidewinder anyone (25 minutes ago) ForgottenAngel666 ::stays in my corner alone:: (25 minutes ago) sidewinder anywanna trade comments? (26 minutes ago) sidewinder anywanna trade comments? (26 minutes ago) Yume *sighs* I must leave but I will return! (26 minutes ago) Yume *sighs* I must leave but I will return! (26 minutes ago) Yume *sighs* I must leave but I will return! (27 minutes ago) sidewinder joins sprkls (29 minutes ago) sprkls926 *sits on the roof with her head cast downward* (30 minut +es ago) sprkls926 *bites sidewinder then leaves the room* (30 minutes ago) sidewinder *smiles* (31 minutes ago) Yume noooooooooo! not monkey!! (32 minutes ago) sprkls926 munkey come back i am going outside u r safe no one will hur +t u (32 minutes ago) sprkls926 i will be back soon (32 minutes ago) sathethert with that, i have to go to trauma. see yaaaaaaaaaaaaaa +(32 minutes ago) ForgottenAngel666 ..munkey come back.! (not that u even like me no mor +e ) ForgottenAngel666 ..munkey come back.! (not that u even like me no mor +e ) ForgottenAngel666 ..munkey come back.! (not that u even like me no mor +e ) Current page count: 2 Current page count: 3 Current page count: 4 Current page count: 5 Current page count: 6 Current page count: 7 Current page count: 8 Current page count: 9 Current page count: 10

Replies are listed 'Best First'.
Re: Re: Re: Which loop should I use?
by tedrek (Pilgrim) on Jul 31, 2003 at 19:16 UTC

    There's nothing wrong with your loop.. but the HTML on the first page is different from the rest, you're just not parsing it correctly.The first page doesn't have a - between the < and 'Next Chatter' which throws off your regex.

    Well I got interested in this problem, So here's a replacement :). I dropped HTML::Tree because I thought it would be nice to be able to have the full text of messages and it was much simpler to just grab straight from the HTML. It should be trivial to run the message text back through HTML::Tree to strip the HTML. I had originally tried tackling this using the parse tree but the code was twice as long and uglier, not to mention it didn't work :(. I created a get function that grabbed off local disk so I didn't have to hit the website whenever I wanted to test.

    And on to the code!:

    Update: Everything that isn't struck out :)

    Update2: Added a few linebreaks so one line of code wouldn't wrap

      Thank you for the code rewrite and the interest you have in this problem. While running your code I was presented with 500 ISE:
      syntax error at newparse.pl line 63, near "+}" Can't use global $1 in "my" at newparse.pl line 72, near "= $1" Can't use global $! in "my" at newparse.pl line 73, near "$number: $!" Can't use global $! in "my" at newparse.pl line 76, near "$number: $!"
      And I cannot use your script for anything but reference because there are many things within your version that I don't understand (most of your script actually). Using HTML::Tree the code may not have been perfect but I understood all of what I was trying to do.

      I will definitely keep this script and I will try to work out the bugs. Thanks!

        The error you got was because one line got wrapped by PM, I've added a few line breaks so nothing wraps now. In your original code there was three things that jumped out at me. The first was a 'local $/;', you aren't doing anything with IO so that doesn't do anything. The second is your use of 'split "<br>",...', You are using that on the text only version which will not have any html in it so it is just assigning the text to $lines[0]. Also you split on "<br>" later which won't do anything because the original split would have gotten all of the <br>'s anyway. The third is your regex for getting $goodlines, it will only match against a string that hasn't been broken up into lines, which means your split has to fail for that regex to work. Anyway, hope you can get it to work, and if you have any questions feel free to ask.

        Tedrek
Re: Which loop should I use?
by tadman (Prior) on Jul 31, 2003 at 19:13 UTC
    I'd really recommend using something more standard. For example:
    for my $cnt ($start .. $end) { # ... }
    There's no need to have an independent $cnt when you can use that name as the looping variable.

    Also, what's wrong with that output? Maybe you could check that the page is loading properly, for example, by testing defined($content) && length($content)