Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Ok.. heres a toughie. I have one scaler and two arrays, the sclaer called $topic (containing 2 - 40 elements within the it, I also have @topic with the same data), @badwords (which contains words we want to censor because they're potty mouth) and @exceptions (which contains words that are allowed to be used).

I have some code right now that censors the $topic like so

for (@badwords) { $topic =~ s/$_/<censored>/g }
to turn any string that matches in $topic into <censored> and it works GREEEEEEEEEEEAT as tony the tiger says..

a week ago, someone came to me and said words like "ass" were getting drop kicked because of the word "class" and it would turn out as cl<censored>.

So, is there anyway that I can continue to look for matches in $topic that match with any word in @badword but if the word (whole word seperated by spaces) matches to ignore the whole thing and not censor that word at all but continue checking the rest of the words. does that make sence?

heres a better example, heres what happens now

for the example, lets say @badwords consists of "BARNEY GOOB CHICKEN"

if i try to set the topic to: "I LOVE BARNEY AND CHICKEN AND GOOBERS"

that code I have now will change it to:

"I LOVE <censored> AND <censored> and <censored>ERS"

Now lets say that @exceptions contains "GOOBERS", I would *LIKE* to have $topic turn into this instead: "I LOVE <censored> AND <censored> and GOOBERS"

Is that possible? I sure would apperciate the help, thanks! (by the way, everything is in lowercase and $topic is tr/[A-Z]/[a-z]/'ed)

UPDATE Masem lts', gts', CODE and Ps added for readability.

  • Comment on Defenetly a complicated perl recipie with 2 arrays, 1 scaler and some special matching
  • Select or Download Code

Replies are listed 'Best First'.
Re: Defenetly a complicated perl recipie with 2 arrays, 1 scaler and some special matching
by Dominus (Parson) on Jul 15, 2001 at 06:28 UTC
    Use
    $topic =~ s/\b$_\b//g;
    and it will not turn class into cl.

    The \b's tell Perl you only want the pattern to match at a 'word boundary'.

    Hope this helps.

    --
    Mark Dominus
    Perl Paraphernalia

      after that port, i noticed some of my words were getting drop kicked. I'll put spaces in it so you can see the word: < c e n s o r e d > thats ALL over the submitted question how much does that change?
        well, i cant type it out at all, its the word censored with pointy brackets on both sides
Re: Defenetly a complicated perl recipie with 2 arrays, 1 scaler and some special matching
by Masem (Monsignor) on Jul 15, 2001 at 06:41 UTC
    I believe that to do what you want do say, with the side effect of possibly leaving extra spaces in your topic:
    { $topic =~ s/(^|\W)$_($|\W)/$1$2/g; }

    update yea yea, my original code was fscked up badly .. that's what you get for looking at what edits need to be done at the same time...

    that is, look for the word between either the start of the starting or a non-word character, and either the end of the string or a non-word character. You might end up with consecutive spaces in the topic this way; you may want to consider using a placeholder to indicate you removed something (something as simple as BLEEP), but that's up to your project specifications.


    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
      Says masem:
      { $topic =~ s/([^/W])$_([$/W])/$1$2/g; }
      There are several errors here. You have /W where you seem to have meant \W. You have ^ inside a character class, where it means to negate the class, but you seem to have wanted it outside, so that it means beginning-of-string. You have $ inside the other character class, so you are interpolating $/ into the regex, and you seem to have wanted the $ in a place where it would mean end-of-string. I think you might have meant:
      s/(^|\W)$_(\W|$)/$1$2/g
      But the thing I suggested with \b is simpler anyway.

      --
      Mark Dominus
      Perl Paraphernalia

        both of those work great and are doing exactly what I need! thanks!
Re: Defenetly a complicated perl recipie with 2 arrays, 1 scaler and some special matching
by Anonymous Monk on Jul 15, 2001 at 22:27 UTC
    You came into #perl on EFNET and didn't wait too long for a response. Here's something that may get you started. I'm sure it can be fine-tuned, as this is something I whipped up in about 5-10 minutes.
    my @badwords = ( '31337', 'hacker' ); my @exceptions = ( 'perl_31337_hacker', ); my $string = 'I am a 31337 hacker. A perl_31337_hacker, that is, not a + c_31337_hacker.'; my @occurences = (); my ($word, $oldword, $badword); foreach $word (@badwords) { push(@occurences, $string =~ /\b\S*$word\S*\b/g); } my %exceptions; @exceptions{@exceptions} = @exceptions; foreach $word (@occurences) { if ( !exists ($exceptions{$word}) ) { $oldword = $word; foreach $badword ( @badwords ) { $word =~ s/$badword/<censored>/g; } $string =~ s/$oldword/$word/; } } print $string;
    Output:
    I am a <censored> <censored>. A perl_31337_hacker, that is, not a c_<c +ensored>_<censored>.
    e-Motion