skx has asked for the wisdom of the Perl Monks concerning the following question:

I'm having a problem using the URI::Find module to process a mixture of plain text and HTML code.

The perldoc suggests using the following code:

use CGI qw(escapeHTML); $text = "<pre>\n" . escapeHTML($text) . "</pre>\n"; my $finder = URI::Find->new( sub { my($uri, $orig_uri) = @_; return qq|<a href="$uri">$orig_uri</ +a>|; }); $finder->find(\$text);

This works beautifully when I'm processing plain text input, however the callback doesn't have any context so it cannot avoid modifying the following badly:

<a href="http://foo.com/">foo</a>

This is transformed even though I don't need it to be.

Short of trying to heuristically detect whether I'm processing HTML or plain text is there another module I could use to insert hyperlinks around URIs which are not already linked ?

Steve
--

Replies are listed 'Best First'.
Re: Using URI::Find with HTML
by merlyn (Sage) on Dec 06, 2005 at 15:12 UTC
    You could use this technique with HTML::Parser that by default passes the tags, attributes, and comments through un-touched, but for the text portion performs the substitution above. Be sure to set "unbroken text" so you don't get two callbacks in a given text run.

    Adaping one of the examples there, it'd be something like:

    use HTML::Parser; HTML::Parser->new( unbroken_text => 1, default_h => [sub { print shift }, 'text'], text_h => sub { my $text = shift; (URI::Find here) +; print $text }, 'text'], )->parse_file(shift || die) || die $!;

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Thanks a lot for your help, that pointed me in the right direction.

      Steve
      --