Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Well Monks, by a remarkable coincidence my question is closely related to another posted here today. I'm not a Perl programmer or any other kind of programmer, but I can usually figure out the simple things. However, this one has me stumped.
I have a short Perl script that filters incoming email in preparation for archiving. Some email clients wrap email when sending it -- either by default or because they have been set up that way. So recipients sometimes get broken URLs and there is nothing they can do it about it.
What I need to do is match a broken URL that looks like this:
http://www012.upp.so-net.ne.jp/sculpture/gallery/backnumber/g_s_maeda/ g_maeda_sakuhin2.html
The linebreak may appear anywhere, but the URL is always split on a boundary such as a slash or dot.
Does anyone here have any idea how to construct a regular expression to match an URL broken in this way, with an linebreak at an arbitrary position?
I guess it would be something like this:
The difficult bit (for me) is detecting an URL fragment.
I'm grateful already, as I have found a load of other useful stuff on this miraculous website.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex to detect and remove a linebreak in an URL
by saintmike (Vicar) on May 19, 2004 at 15:45 UTC | |
|
Re: Regex to detect and remove a linebreak in an URL
by Abigail-II (Bishop) on May 19, 2004 at 19:48 UTC | |
|
Re: Regex to detect and remove a linebreak in an URL
by Roy Johnson (Monsignor) on May 19, 2004 at 18:22 UTC | |
by Abigail-II (Bishop) on May 19, 2004 at 19:43 UTC | |
by Roy Johnson (Monsignor) on May 19, 2004 at 20:02 UTC | |
by Anonymous Monk on May 20, 2004 at 00:49 UTC | |
by Roy Johnson (Monsignor) on May 20, 2004 at 01:50 UTC | |
|
Re: Regex to detect and remove a linebreak in an URL
by Hagbone (Monk) on May 20, 2004 at 01:03 UTC | |
|
Re: Regex to detect and remove a linebreak in an URL
by Anonymous Monk on May 20, 2004 at 11:05 UTC |