Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, can anyone help with this little problem I have?

I'm trying to automatically hyperlink url's from some plain text and I have a simple (but not perfect) regular expression that does this.

However when I try and use it with a url that wraps onto a new line, the url itself contains a new line and stops the browser going to the right page.

Allow me to demonstrate with this bit of code:

#!/usr/bin/perl use strict; my $output = ""; open(FILE, "file.txt") || die "$!"; while(<FILE>) { $output .= $_; } close FILE; $output =~ s#(http://[^\!\"\£\$\^\*\(\)\{\}\[\]\;\:\'\@\,\<\> ]+)#\<a +href="$1" target="_blank"\>$1\</a\>#gis; print $output;
If file.txt has the following:
The quick brown fox at http://fish.org would like to go home and http: +//see through.com/pants/foo/bar what there is to eat
then my output is:
The quick brown fox at <a href="http://fish.org" target="_blank">http: +//fish.org</a> would like to go home and <a href="http://see through.com" target="_blank">http://see through.com</a> what there is to eat
So my question is, is there any way i can remove that \n that is in the middle of the a href?

Many thanks!

Replies are listed 'Best First'.
Re: Automatically hyperlinking text fails with newlines
by suaveant (Parson) on Sep 20, 2001 at 21:05 UTC
    try...
    while(<FILE>) { chomp; $output .= $_; }
    which will remove newlines from $_

                    - Ant
                    - Some of my best work - Fish Dinner

      Hi, thanks for this.

      However I only want to remove the \n from inside the hyperlink whilst still preserving the newlines elsewhere. In other words the output should be:

      The quick brown fox at <a href="http://fish.org" target="_blank">http: +//fish.org</a> would like to go home and <a href="http://seethrough.c +om" target="_blank">http://see through.com</a> what there is to eat
      (http://www.seethrough.com is preserved within the href tag, but still has the newline for the display)
        well then... you could pass $1 as a function, since you are matching the hyperlink with the \n, similar to...
        s/regex/make_link($1)/eg; sub make_link { my $url = $_[0]; $url =~ s/\s+//g; qq`<A HREF="$url">$url</A>`; }

                        - Ant
                        - Some of my best work - Fish Dinner

        Greetings.

        This being the case, I would (still) slurp the entire file, not replace \n, and put \n among the acceptable characters in the regexp (and I *think* you may have to use the /s modifier). I would then replace the \n still embedded in the href.

        Not perfect, because this:

        Ehi, check this out on http://cnn.com\n
        dude!
        
        would yield: http://cnn.comdude. In other words, if you want ignorable newlines within URIs, then URIS must always be separated by real whitespace from the surrounding text.

        Cheers,
        alf

Re: Automatically hyperlinking text fails with newlines
by alien_life_form (Pilgrim) on Sep 20, 2001 at 21:20 UTC
    What about:

    ...
    #error checking and stuff left as an exercise.
    $/=undef; #slurp files.
    $lines=<FILE>;
    close(FILE);
    
    $lines =~ s/\n//g;
    
    #now do your thing.
    

    Note that this joins lines, so if you count on newlines as whitespace, you're out of luck with this.

    Cheers,
    alf