I've taken this code from two parts of the perl cookbook and I'm having problems making them work together. My goal is to take URLs in a block of text and re-write them according to the following rules:
That's probably not entirely clear. How about a code snippet:
$server = "http://www.foo.com" $path = "/absolute/path/" $html = ' <a href="/absolute/no/dns">absolute with no dns</a> <a href="http://absolute.with/dns.html">http://absolute.with/dns.ht +ml</a> <a href="relative/without/dns.html">relative/without/dns.html</a> <a href="relative2/without/dns.html">relative2 without dns.html</a> '; $html =~ s/ (<\s* (?:a|img|area) [^>]+?(?:href|src) \s*=\s* ["']? ) ( [^'"\/>] [^'" >]+? ) ([ '"]?>) / $1.sprintf("%s%s", $path, $2).$3 /sigex;
This bit works okay for the first one and the last two, but the middle case (http://) fails because (clearly) I don't have any case that tells it to avoid a leading protocol string (something like http://, ftp://, gopher://, news://, etc.) So I looked at the urlify program in the cookbook chapter 6 and tried this:
$html =~ s/ (<\s* (?:a|img|area) [^>]+?(?:href|src) \s*=\s* ["']? ) ( [^'"\/>(http|telnet|gopher|file|wais|ftp)] [^'" >]+? ) ([ '"]?>) / $1.sprintf("%s%s", $path, $2).$3 /sigex;
Which is _far_ worse since now none of the cases matches. (Well, not entirely true, if I remove the non-match for a leading '/' I can get the first case to match, but that's exactly not what I want.)
I guess this is my question: How can I do a non-match on a string? I want to prevent the http:// links from matching, but I can't seem to get it to play nice. Has anyone else done this?
Oh, and don't worry about the full DNS pre-pending, it's the same problem so when I fix one, the other comes for free. But it someone might have a suggestion on how I could do this all with one pass, I'd love to hear it, as it is I'm planning on doing two passes, the first with the path, the second with the DNS and protocol info.
-J.
In reply to matching the non-presence of a string by Joey The Saint
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |