I'm working on some Wiki-like auto-linking code that scans text for known words or strings and when found replaces them with HTML hyperlinks. For example, if "MySQL" is in the list of known strings then the code turns that word into a hyperlink when it is found in a sentence like "using MySQL or another database".

I am trying to come up with a regex that will perform this substitution but only when the string is:

Without these special provisions if someone ever manually wraps the word MySQL (or a sentence containing it) inside anchor tags then I end up with nested anchor tags which are invalid HTML.

I've seen various regexps for matching anchors or other tags, but I can't figure out how to match something that's not inside an anchor or a tag... I've tried all sorts of nasty look-behind/look-ahead stuff but nothing that works yet. Sometimes it gets so ugly that I start wondering if I have to write some kind of recursive HTML tokenizer (ugh)... Any ideas?


In reply to regex to match content not inside an HTML anchor or other tags by GregHurrell

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.