in reply to RegEx to match at least one non-adjacent term

Normally, you'd be able to use \b.

my $whitespace = qr/[\s()]+/; my $badwords = qr/.../i; my $wordchar = qr/[a-zA-Z]/; s/ $whitespace? \b $badwords \b $whitespace? / /xg;

But since you want to allow "12345Red6789", you'll have to implement your own version of \b.

my $whitespace = qr/[\s()]+/; my $badwords = qr/.../i; my $wordchar = qr/[a-zA-Z]/; s/ $whitespace? (?<! $wordchar ) # At start of word. $badwords # Words to erase. (?! $wordchar ) # At end of word. $whitespace? / /xg; # Avoid joining two numbers.

By the way, Regexp::List can build an efficient $badwords.

use Regexp::List qw( ); my @badwords = qw( r rd red ); my $badwords = Regexp::List->new(modifiers=>'i')->list2re(@badwords); # qr/r(?:e?d)?/i

Update: Added Regexp::List bit.

Replies are listed 'Best First'.
Re^2: RegEx to match at least one non-adjacent term
by Cefu (Beadle) on Dec 07, 2007 at 17:00 UTC

    Thanks for the info on Regexp::List. I need to get more familiar with the modules that are out there.

    I think your initial solution would end up always leaving a space where the word was? That's fine if the word was between two number but not if it was at the beginning or end. I'm happy with the cheap trim I get from removing spaces along with the bad words.

    If I'm wrong about it always leaving a space, I appologize. I am committing the sin of commenting without executing the example as I don't have access to Perl on my internet connected machine.

      Yes it does, in order to avoid "1234 Red 5678" becoming "12345678". Feel free to remove extra whitespace afterwards. Doing it in the regex would needlessly complicate it.

      s/.../ /xg; s/^\s+//; s/\s+$//;