in reply to Regex to detect and remove a linebreak in an URL

Thanks everyone -- including the sceptics -- for your help with this one.

I think the problem may not be amenable to any real solution. Although we would be happy with a solution that matched most broken URIs most of the time (or even some of the time), we can't accept the risk of creating new broken URLs -- and unfortunately, that could easily happen.

Take this simple example:

Here is a valid link http://mydomain.org/
which is not broken

I can't see any way of detecting whether http://mydomain.org/ is correct or whether it should really be http://mydomain.org/which

It would work if we could be certain that the first token on the next line -- the one to be appended to the possibly-broken URL -- could only exist as an end-fragment of a syntactically-correct URL. I suspect that even if theoretically possible, it would be impractical.

  • Comment on Re: Regex to detect and remove a linebreak in an URL