in reply to Re: Regex to detect and remove a linebreak in an URL
in thread Regex to detect and remove a linebreak in an URL

[^\w\s] matches all characters that are not both a word character and a whitespace character. That is, it matches *any* character. Which means that from the string <URL:http://www.example.com/> your first regex is going to match http://www.example.com/>. Your second regex is only going to match if the string contains http: followed by a sequence of non-whitespace characters, followed by a whitespace character, followed by a newline. Given the string http://www.example.com/\npath/here, your solution will not set $http to 1.

Abigail

Replies are listed 'Best First'.
Re: Re: Regex to detect and remove a linebreak in an URL
by Roy Johnson (Monsignor) on May 19, 2004 at 20:02 UTC
    Astonishingly, you are wrong about the character class. [^\w\s] matches characters that are neither words nor whitespace; i.e., punctuation:
    $_='o.ne<a>tw/o<b>'; @punks = /[^\w\s]/g; print "<@punks>";
    yields
    <. < > / < >>
    The problem with my example was that I left the /g modifier off the pattern match. I've updated it, and tested it:
    while(<DATA>){ /http:\S*[^\w\s]/g and s/\G\n//; print; } __DATA__ there is an http://whatever.com/address/ crossing/line/boundaries.html right in the middle of this nice string.

    The PerlMonk tr/// Advocate

      Hello Roy and others,

      What is the X in:

      s/\G\n/X/

      ?

        Oh, sorry, that was leftover from some testing. I'm just not having a great brain day today.

        The PerlMonk tr/// Advocate