Thanks everyone -- including the sceptics -- for your help with this one.
I think the problem may not be amenable to any real solution. Although we would be happy with a solution that matched most broken URIs most of the time (or even some of the time), we can't accept the risk of creating new broken URLs -- and unfortunately, that could easily happen.
Take this simple example:
Here is a valid link http://mydomain.org/ which is not broken
I can't see any way of detecting whether http://mydomain.org/ is correct or whether it should really be http://mydomain.org/which
It would work if we could be certain that the first token on the next line -- the one to be appended to the possibly-broken URL -- could only exist as an end-fragment of a syntactically-correct URL. I suspect that even if theoretically possible, it would be impractical.
In reply to Re: Regex to detect and remove a linebreak in an URL
by Anonymous Monk
in thread Regex to detect and remove a linebreak in an URL
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |