comment on

This script has proven to be more difficult than originally planned and more difficult than it's worth :(

I get: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: NAME: Text: Delay: So I think I misused a part of your script incorrectly because I'm not getting errors or anything. Did I mess up anywhere with

use strict;
use CGI qw/:standard/;

use HTML::Tree;
use LWP::Simple;

print header, start_html('test printing');

#my $count;
#until ($count eq "5") {
#$count++;
my $funky = "http://www.allpoetry.com/chat//page=1";


my $content = get($funky);

my $tree = HTML::Tree->new();

$tree->parse($content);


# retrieve the text and split into lines
my @lines = split "<br>", $tree->as_text;


local $/;
my @good_lines;
my $good_lines;



for my $lines (@lines) {
$lines =~ s/\)/\)<br>/g;

while($lines =~ m/Next Chatter \>(.*?)\< Previous Chatter/gs){
     $good_lines = $1;
     push @good_lines,$good_lines;
}
foreach  (@good_lines){

     my @lines = split /<br>/;
     foreach (@lines){
       next unless $_;
       /([^:]+): (.+) \((\d+) minutes ago\)/;
           my( $name, $text, $delay ) = ( $1, $2, $3 );
       print "NAME:$name\nText:$text\nDelay:$delay\n\n";
     }
}
}
[download]

Sorry I keep bugging you, I promise this'll be the last time (I think I'll give up for a while if nothing else works ((note to self: this is why you stopped using HTML:: modules in the first place)) ). Thanks for your help!

In reply to Re: Re: html parsing/regex by coldfingertips
in thread html parsing/regex by coldfingertips

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.