Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:

Hello!

Using the following text:

http://www.google.com <a href="http://www.google.com">fail a match here</a>
I want to produce this:
<a href="http://www.google.com">http://www.google.com</a> <a href="http://www.google.com">fail a match here</a>
But, I am struggling a bit. This is what I have so far:
my $text = qq~ http://www.google.com <a href="http://www.google.com">fail a match here</a> ~; $text =~ s#(http://[^\s]*)(?!["|<])#<a href="$1">$1</a>#gsi; print $text;
But, it is not quite doing what I want, it is still matching the second line even with the lookahead of (?!"|<) I am guessing I can't use | to mean "or" here.

I want my regex to "find any instance of a url not followed by a quote or a carrot" which will replace all urls with clickable links (except those that already have anchor tags around them). Different approaches to this (instead of the lookahead) are fine with me.

Thanks!

Update

This does what I need:

s#[^">](http://[\S]*)# <a href="$1">$1</a>#gsi;
I gave up on the lookahead and just used a character class before the match. I hate giving up on something like that, but oh well :) thanks for all your input (Tokenize will come in handy ikegami, thanks!)

Replies are listed 'Best First'.
Re: Regex: Lookahead
by ikegami (Patriarch) on May 11, 2009 at 22:54 UTC

    Usually you'd want a parser, but a tokenizer will do the trick here.

    • When you encounter a <a>, set $in_link and emit the token.
    • When you encounter a </a>, clear $in_link and emit the token.
    • When you encounter text and $in_link is set, emit the token.
    • When you encounter text and $in_link isn't set, linkify the urls within and emit the modified text.
    • When you encounter anything else, emit the token.

    It's much more robust then using regexp patterns, and it's very clear and simple.

Re: Regex: Lookahead
by Your Mother (Archbishop) on May 11, 2009 at 23:00 UTC

    You should probably take a look at URI::Find and consider mixing it with HTML::TokeParser (or friends). It will be more robust and you'll have fun, one hopes, exploring what those packages can do.

Re: Regex: Lookahead
by codeacrobat (Chaplain) on May 11, 2009 at 22:54 UTC
    why so complicated?
    $_ =~ http://www.google.com <a href="http://www.google.com">fail a match here</a> ~; s#^(http://\S+)#<a href="$1">$1</a>#gmsi; print;
    Whatever multiline matching you are trying to do the you should use the /m flag.

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
      Except that it doesn't work. Because your example is doing the replacement on the second line (like mine). I need it to fail the replacement on the second line (because it already is an html link).
        Did you run it? (Change "$_ =~" to "$_ = q~" first.) It may not do what you want, but contrary to what you claim, it doesn't touch the second line. (Note the /^/m)
Re: Regex: Lookahead
by JavaFan (Canon) on May 12, 2009 at 07:29 UTC
    want my regex to "find any instance of a url not followed by a quote or a carrot"
    Oh, but it does. Except that you failed to realize that http://www.google.co is also a URL not followed by a quote or a carrot.

    And you don't need the bar inside a character class. Unless you want to match a bar as well.

      What did I fail to realize? The first line does not have a carrot or a quote, so I want it to match there. The second line does have a quote following it, so I want it to fail on that one (do nothing).

        The second line does have a quote following it,

        "it"? There are many urls in the second line. Yes, one of them is followed by a quote. But you asked to find one that isn't followed by a quote, and perl found one. "http://www.google.co" is followed by a "m", not a quote.

Re: Regex: Lookahead
by ikegami (Patriarch) on May 12, 2009 at 15:56 UTC

    Since the error is being repeated,

    While both are pointy, carrot is a veggie, and "^" is a caret.

      Which is why I only eat karats. Not much vitamin A but a significant increase in resale value of post-ingestion exports. (Oh, dear... It was the caffeine that made me do it.)

      Doh! :)