comment on

The approach is a two step approach - you get callbacks for three events, the start of a tag (start_h), the end of a tag (end_h) and anothe callback for any text encountered (text_h). So you need to set up a text handler that will see all text, and modify your start handler such that it increases a counter whenever it enters a <A ... tag, and your end handler such that it decreases that counter.

Your text handler then knows whenever it encounters text from within an anchor.

Some untested code that should replicate what I discussed :

# start of ParseLink
  {
    package ParseLink;
    our @ISA = qw(HTML::Parser);

    # called by parse
    sub start
      {
        my ($this, $tag, $attr) = @_;

        if ($tag eq "a")
        {
          # You might want to check for name="#anchor" links
          # here ...
          $this->{links}{$attr->{href}} = "(no text given)";
          $this->{curr_link} = $attr->{href};
          $this->{nesting_a}++;
        }
      }

    sub end
      {
        my ($this, $tag, $attr) = @_;

        if ($tag eq "a")
        {
          $this->{nesting_a}--;
          $this->{links}{$this->{curr_link}} = $this->{curr_text} 
            if $this->{curr_text};
        }
      }

    sub text {
      my ($this, $text) = @_;
      $this->{curr_text} .= $text if $this->{nesting_a} > 0;
    };

    sub get_links
      {
        my $this = shift;
        return keys %{$this->{links}};
      }
  }
[download]

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ;    # The  
$d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider
($c = $d->accept())->get_request(); $c->send_response( new   #in the
HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' #  web
[download]

In reply to Re: Getting the Linking Text from a page by Corion
in thread Getting the Linking Text from a page by jonjacobmoon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.