in reply to matching the non-presence of a string

I'd try something like HTML::LinkExtor to extract the links and then work on them using ordinary conditionals.

As far as matching strings that don't contain certain patterns, you have a number of options. Within a regex, (?!pattern) will match as long as pattern does not occur in the string. Or you could put the negation outside ($string !~ /pattern/).

For real robustness without tearing your hair out, though, I really do recommend moving your logic outside of the regular expression, and just do an explicit series of matches against the href contents.

HTH

Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Replies are listed 'Best First'.
Re: (arturo) Re: matching the non-presence of a string
by Joey The Saint (Novice) on Mar 20, 2001 at 22:22 UTC

    Ah, now I feel silly. You're right, of course. ($!) is what I wanted. My regex has gotten a bit uglier, but it works if I do it this way:

    $nomatch="(?!http|telnet|gopher|...|\"|'|\/| )"; . . . ( $nomatch [^'" >]+? ) . . .

    I don't quite understand why the $nomatch substitution works, but it's probably something simple too and this is a better solution anyway. I just have one place to update if things need to be changed.

    As an aside, since my goal is to re-write the urls in place, I don't see how HTML::LinkExtor would help. I can get the links just fine, I'm just having problems doing the re-writing inline. Am I missing something there?

    -J.

      As an aside, since my goal is to re-write the urls in place, I don't see how HTML::LinkExtor would help. I can get the links just fine, I'm just having problems doing the re-writing inline. Am I missing something there?

      Well, not really I guess. You could try going through line-by-line with a regex, that extracts the links, and then mangles them appropriately in-line. The LinkExtor approach would be to use it to grab all the links, then for each of those links, do an s///g on the text you have (lumped together as a single string) to do the URL mangling. This way *might be* slower (or impractical for other reasons), but given a choice between speed and correctness, my impulse usually lies with correctness. YMMV, of course!

      Philosophy can be made out of anything. Or less -- Jerry A. Fodor