While not a point that is particularly relevent in your situation it should be kept in mind that this approach has limitations. It doesnt scale that well because of the way the regex engine works and the simple conversion of the banned list to a regex would have problems with various regex reserved characters, 'SH|T' would blow it for instance.

A more sophisticated approach might be to keep a hash of banned words with associated hand written regexes to match them. On the fly you could either match against each in turn, maximizing the optimizations available to the regex engine. Or more simply cat them all together as you are doing here, but at least you would have the certainty of knowing the regex fragment used would be correct (as you can make it)

Again I relise this might be too much for this particular situation, but its worth considering, you'd be suprised where bugs from this type of approach show up. The other day I was playing with HTML::TableExtract that uses a very similar mechanism to scan for table column headers. It failed very oddly when a parenthesis or | was in the header name. Oddly enough that it took me a while to track down... ;-)

Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)


In reply to Re: Re: Re: test if a string contains a list member by demerphq
in thread test if a string contains a list member by mull

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.