Hi monks,

Excuse my frailties with perl, as I'm still learning.

I am writing a little script to remove bad words from a document, using a bad words list. Everything is working fine except for lines that have a bad word as the first word in the sentence. Also, it appears the problem may be that these sentences start with a double qoute. For example:

If I want to remove the word 'crud', this sentence doesn't match:

"Crud!", Travis said.

My code is stripping all punctuation (at least it appears to be), and I am using 'lc' on the input words to make sure my bad words match (all are lower case).

After I've removed all punctuation from the above sentence, it looks like this:

Crud Travix said

I can't figure out why the word 'crud' isn't matching in this case, as it is in my bad words list with no typos or anything dumb like that. All other words appear to be matching as expected, and 'crud' matches fine when it appears in other locations in the line. Here's some code:

#!/usr/bin/perl use strict; use warnings; ... while (<$book>) { my $line = $_; my $plainline = $line; $plainline =~ s/["'.,!?:;\-()[\]{}|\\\/]/ /g; #replace all punctua +tion with a space my @sentence = split(/ /,$plainline); foreach my $word (@sentence) { chomp($word); my $whichword = 0; # to track which badword was found foreach my $badword (@badwords) { if (lc($word) eq $badword) { my $newword = replaceword($whichword); #get a cleaner +word to replace the naughty word $line =~ s/($word)/$newword/i; } $whichword++; } } $cleanbook .= $line; }

If you have any suggestions that don't relate to my question, feel free to drop them in. TIA.

----------
My home on the web: http://www.techfeed.net/

In reply to Match first word in line fails by yacoubean

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.