isync has asked for the wisdom of the Perl Monks concerning the following question:
$p->handler( start => \&a_start_handler, "tagname,self,attr" ); $p->unbroken_text( 1 ); $p->parse( $content ) || die $!; foreach my $link ( @linklist ){ print $link->[0]; #link print $link->[1]; #text } sub a_start_handler { my( $tag, $self, $attr ) = @_; # we only act on <a tags return if $tag ne "a"; if( defined( $href = $attr->{href} ) ){ $self->handler(text => sub { $text = shift; $text =~ s/\n//g; },"d +text"); $self->handler( end => \&a_end_handler, "tagname,self" ); } foreach my $key ( keys %$attr ){ # print ">$key=$attr->{$key}\n"; } } sub a_end_handler { return if shift ne "a"; my $self = shift; push @linklist, [ $href, $text ] if defined $text && $text !~ /^\s*$ +/; $self->handler(end => undef ); $self->handler(text => undef ); }
to$self->handler(text => sub { $text = shift; $text =~ s/\n//g; },"d +text");
does not give the expected result (switch from getting dtext to getting text).$self->handler( text => sub { $text = shift; $text =~ s/\n//g; },"text +");
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Parser to extract link text?
by Juerd (Abbot) on Jun 19, 2007 at 18:13 UTC | |
by isync (Hermit) on Jun 19, 2007 at 21:15 UTC | |
by Juerd (Abbot) on Jun 19, 2007 at 21:23 UTC | |
by isync (Hermit) on Jun 19, 2007 at 21:32 UTC | |
by Juerd (Abbot) on Jun 19, 2007 at 21:50 UTC | |
|