Re: Regex to detect and remove a linebreak in an URL

Maybe

/http:\S*[^\w\s]/g and s/\G\n//;
[download]

would be good enough? If they cross multiple line boundaries, you'd need to keep a flag, like:

/http:\S*[^\w\s]/g and s/\G\n// and $http=1;
$http and /^\S*[\w\s]/g and s/\G\n// or $http=0;
[download]

But a lot of URLs end in slashes, and it would be perfectly reasonable to see (in mail):

Check out
http://homestarrunner.com/sbemail/
and let me know what you think
[download]

and the processor would join the "and" line onto the end of the URL.

Update: Oops, forgot the /g modifier on the pattern matches, so the \G didn't work.

The PerlMonk tr/// Advocate

Comment on Re: Regex to detect and remove a linebreak in an URL Select or Download Code

Replies are listed 'Best First'.
Re: Regex to detect and remove a linebreak in an URL by Abigail-II (Bishop) on May 19, 2004 at 19:43 UTC
`[^\w\s]` matches all characters that are not both a word character and a whitespace character. That is, it matches any character. Which means that from the string `<URL:http://www.example.com/>` your first regex is going to match `http://www.example.com/>`. Your second regex is only going to match if the string contains `http:` followed by a sequence of non-whitespace characters, followed by a whitespace character, followed by a newline. Given the string `http://www.example.com/\npath/here`, your solution will not set `$http` to 1. Abigail	[reply] [d/l]
Re: Re: Regex to detect and remove a linebreak in an URL by Roy Johnson (Monsignor) on May 19, 2004 at 20:02 UTC
Astonishingly, you are wrong about the character class. `[^\w\s]` matches characters that are neither words nor whitespace; i.e., punctuation: `$_='o.ne<a>tw/o<b>'; @punks = /[^\w\s]/g; print "<@punks>";` [download] yields `<. < > / < >>` [download] The problem with my example was that I left the /g modifier off the pattern match. I've updated it, and tested it: `while(<DATA>){ /http:\S*[^\w\s]/g and s/\G\n//; print; } __DATA__ there is an http://whatever.com/address/ crossing/line/boundaries.html right in the middle of this nice string.` [download] The PerlMonk `tr///` Advocate	[reply] [d/l] [select]
Re: Re: Re: Regex to detect and remove a linebreak in an URL by Anonymous Monk on May 20, 2004 at 00:49 UTC
Hello Roy and others, What is the X in: s/\G\n/X/ ?	[reply]
Re^4: Regex to detect and remove a linebreak in an URL by Roy Johnson (Monsignor) on May 20, 2004 at 01:50 UTC