in reply to Regex to detect and remove a linebreak in an URL
Thanks everyone -- including the sceptics -- for your help with this one.
I think the problem may not be amenable to any real solution. Although we would be happy with a solution that matched most broken URIs most of the time (or even some of the time), we can't accept the risk of creating new broken URLs -- and unfortunately, that could easily happen.
Take this simple example:
Here is a valid link http://mydomain.org/ which is not broken
I can't see any way of detecting whether http://mydomain.org/ is correct or whether it should really be http://mydomain.org/which
It would work if we could be certain that the first token on the next line -- the one to be appended to the possibly-broken URL -- could only exist as an end-fragment of a syntactically-correct URL. I suspect that even if theoretically possible, it would be impractical.
|
|---|