Massyn has asked for the wisdom of the Perl Monks concerning the following question:

#!/fellow/monks.pl I've been working on somewhat of an HTML generator from raw text files. One of the things I'd like to do is to translate any http://xxx comments in my text to a proper hyperlink... For example http://www.massyn.net should become <a href="http://www.massyn.net">http://www.massyn.net</a>. I was hoping to do it in the attached code, but it's not working entirely as I hoped. I fell in love with regular expressions, but this one is tricky... I can't get the (.+) to stop at the end... Any assistance is much appreciated.

$text =~ s/http:\/\/(.+)\s/<a href=\"http:\/\/$1\">http:\/\/$1<\/a>/;

Thanks!

     |\/| _. _ _  ._
www. |  |(_|_>_>\/| | .net
                /
The more I learn the more I realise I don't know.
- Albert Einstein

Replies are listed 'Best First'.
Re: regex to identify http:// in html
by merlyn (Sage) on Nov 26, 2005 at 14:38 UTC
Re: regex to identify http:// in html
by polypompholyx (Chaplain) on Nov 26, 2005 at 14:03 UTC
    For a quick hack, s{(http://\S+?)(\s+)}{<a href="$1">$1</a>$2} will do what you ask. The important thing to note is the \S+?, which makes the regex non-greedy, i.e. it'll match the minimum amount required for the regex to succeed, rather than the maximum amount, which is what \S+ or .* would do. I've also used \S (any non-space character), as it's best to avoid . where you can: see death to dot star.

      Your use of a non-greedy quantifier isn't best here. You are already specifying \S and, since you are being specific, the non-greediness isn't really buying you anything. (In fact, it's somewhat less efficient.) You can also skip the capturing of space at the end. You are just re-adding it anyway, so just leave it alone to begin with. Your regex would be better written as:

      s!(http://\S+)!<a href="$1">$1</a>!g;
      And, you might as well catch https as well:
      s!(https?://\S+)!<a href="$1">$1</a>!g;

      -sauoq
      "My two cents aren't worth a dime.";
      
Re: regex to identify http:// in html
by Samy_rio (Vicar) on Nov 26, 2005 at 14:18 UTC

    Hi Massyn, Try this,

    my $str="McGlaughlin http://www.karayiannis.com and http://www.samy.co +m"; $str =~ s/\b((?:http\:\/\/)|(?:www\.))([^ ]+)/<a href=\"$&\">$&<\/a>/s +gi; print $str; __END__ McGlaughlin <a href="http://www.karayiannis.com">http://www.karayianni +s.com</a> and <a href="http://www.samy.com">http://www.samy.com</a>

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';