in reply to regex help or pointer to module needed

One example is hardly enough to guess what it is about that sample that makes it definitively spam?

If the determining pattern is that an open tag is follow by text that is followed by a non-matching close tag, then something like this might work.

m[ < ( [^>]+ ) > .*? </ (?! \1> ) ]x

There are probably many ways that this could be improved, but it would require more samples to decide how.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re^2: regex help or pointer to module needed
by Xxaxx (Monk) on Jun 08, 2004 at 08:01 UTC
    Good suggestion.

    Unfortunately I believe it will match on

    <a href="page.html">Link Text</a>
    I tried expanding this prior to seeking help here with something like:
    m[ < ( [^>\s]+ ) > .*? </ (?! \1> ) ]x
    I hoped the no-space condition would solve things. Unfortunately eBay and Amazon send emails that were caught.

    Still all in all I think this expression along with a white list may be the direction I go for speed.

    Good suggestion.