GregHurrell has asked for the wisdom of the Perl Monks concerning the following question:
I'm working on some Wiki-like auto-linking code that scans text for known words or strings and when found replaces them with HTML hyperlinks. For example, if "MySQL" is in the list of known strings then the code turns that word into a hyperlink when it is found in a sentence like "using MySQL or another database".
I am trying to come up with a regex that will perform this substitution but only when the string is:
Without these special provisions if someone ever manually wraps the word MySQL (or a sentence containing it) inside anchor tags then I end up with nested anchor tags which are invalid HTML.
I've seen various regexps for matching anchors or other tags, but I can't figure out how to match something that's not inside an anchor or a tag... I've tried all sorts of nasty look-behind/look-ahead stuff but nothing that works yet. Sometimes it gets so ugly that I start wondering if I have to write some kind of recursive HTML tokenizer (ugh)... Any ideas?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regex to match content not inside an HTML anchor or other tags
by Ido (Hermit) on Jun 27, 2005 at 10:45 UTC | |
|
Re: regex to match content not inside an HTML anchor or other tags
by ww (Archbishop) on Jun 27, 2005 at 12:41 UTC | |
by GregHurrell (Initiate) on Jun 27, 2005 at 14:44 UTC |