in reply to Match first word in line fails

Maybe I'm just daft tonight, as may be the case at after 1 AM my time. Perl has powerful features in its regex engine. I suggest using them and being done with it. Is there some reason I'm missing that the following doesn't work?

#!/usr/bin/perl -w use strict; my %bad_words = ( 'crud' => 'dirt', 'crap' => 'dung', 'poo' => 'dung', 'wanker' => 'fool' ); my $bad_re = join '|', keys %bad_words; while ( <DATA> ) { s/\b($bad_re)\b/$bad_words{lc($1)}/ieg; print; } __DATA__ "Wanker!" said I. "Crap," says Travis. This is horse poo. Poo, I tell you! "Poo upon all the crud and crap the wanker could see."

When I tested this, it did what I think is being sought. Other than the building of the wordlist hash and the displaying (or writing to file, or assigning to a variable) of the changed text, it's just one line outside the loop to get ready for the regex, and one inside the loop to do the work. Here's my output:

"fool!" said I. "dung," says Travis. This is horse dung. dung, I tell you! "dung upon all the dirt and dung the fool could see."

There's no need, as far as I can tell, for anything besides search and replace when what's wanted is search and replace.

Update: Looking back at this, the /e on the s/// isn't necessary.



Christopher E. Stith

Replies are listed 'Best First'.
Re^2: Match first word in line fails
by ysth (Canon) on Jan 30, 2005 at 10:36 UTC
    I would make it slightly more complicated, by calling replaceword("$1") in the substitution:
    sub replaceword { my $w = $_[0]; my $repl = $bad_words{lc($w)}; if (lc($w) eq $w) { lc($repl); } elsif (ucfirst($w) eq $w) { ucfirst($repl); } else { uc($repl); } }
      If getting the right case is important, some extra steps may be needed. I think this version of replaceword() would be sufficient, as either lowercase or ucfirst() would be the normal casing. All-uppercase as a default disallows strange mixed case renderings, so while it would preserve some effect, it's still not a complete solution.

      sub replaceword { my $w = $_[0]; my $repl = $bad_words{ lc( $w ) }; $repl = ucfirst( $repl ) if ( $w =~ /^[A-Z]/ ); return $repl; }

      Of course, this would require that the eval flag stay on the substitution in my previous post, contrary to the update which applies to the code as appears in that node.



      Christopher E. Stith
        Yeah, I thought about something that would map case letter by letter, wrapping around if the replacement word were longer, but decided that leaving it uppercase was good enough. I didn't really think if uppercase as the default, since I would expect almost all cases to be lower or ucfirst as you say, but as a fallback for weird case (or actual uppercase input).