Excuse my frailties with perl, as I'm still learning.
I am writing a little script to remove bad words from a document, using a bad words list. Everything is working fine except for lines that have a bad word as the first word in the sentence. Also, it appears the problem may be that these sentences start with a double qoute. For example:
If I want to remove the word 'crud', this sentence doesn't match:
"Crud!", Travis said.
My code is stripping all punctuation (at least it appears to be), and I am using 'lc' on the input words to make sure my bad words match (all are lower case).
After I've removed all punctuation from the above sentence, it looks like this:
Crud Travix said
I can't figure out why the word 'crud' isn't matching in this case, as it is in my bad words list with no typos or anything dumb like that. All other words appear to be matching as expected, and 'crud' matches fine when it appears in other locations in the line. Here's some code:
#!/usr/bin/perl use strict; use warnings; ... while (<$book>) { my $line = $_; my $plainline = $line; $plainline =~ s/["'.,!?:;\-()[\]{}|\\\/]/ /g; #replace all punctua +tion with a space my @sentence = split(/ /,$plainline); foreach my $word (@sentence) { chomp($word); my $whichword = 0; # to track which badword was found foreach my $badword (@badwords) { if (lc($word) eq $badword) { my $newword = replaceword($whichword); #get a cleaner +word to replace the naughty word $line =~ s/($word)/$newword/i; } $whichword++; } } $cleanbook .= $line; }
If you have any suggestions that don't relate to my question, feel free to drop them in. TIA.
In reply to Match first word in line fails by yacoubean
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |