vxp has asked for the wisdom of the Perl Monks concerning the following question:

First things first: What I am trying to do is create a very simple postfix filter. it'll examine each message and upon finding a word, it will replace that word with another word, and inject the modified email back into the postfix queue. Here's what I have done so far:
#!/usr/bin/perl use Mail::Internet; $msg = Mail::Internet->new([ <> ]); $to = $msg->get('To'); #$content = join( '',@{$msg->body} ); @content = @{$msg->body}; print "To: " . $to; foreach $line (@content) { @words = split(/ /, $line); foreach $word (@words) { if ($word =~ /blah/) { $word = "something"; } print "$word "; } }
So, upon finding "blah" it will replace it with "something". here's a test email that I fed into it:
From root@val.vmsinfo.com Fri Dec 14 14:54:57 2007 Return-Path: <root@val.vmsinfo.com> X-Original-To: vxp Delivered-To: vxp@val.vmsinfo.com Received: by val.vmsinfo.com (Postfix, from userid 0) id 86A085FD705; Fri, 14 Dec 2007 14:54:57 -0500 (EST) To: vxp@val.vmsinfo.com Subject: hi Message-Id: <20071214195457.86A085FD705@val.vmsinfo.com> Date: Fri, 14 Dec 2007 14:54:57 -0500 (EST) From: root@val.vmsinfo.com (root) Status: O X-Status: X-Keywords: X-UID: 7 hi Val blah some more blah ..
and here's the resulting output:
[vxp@val ~]$ ./mail.pl test.txt To: vxp@val.vmsinfo.com hi Val something some more something .. [vxp@val ~]$
Now, as you see, the "\n" before the second line was lost in the output: the ".." is now on the first line.. Any suggestions ? :)

Replies are listed 'Best First'.
Re: A regex question
by moritz (Cardinal) on Jan 16, 2008 at 20:27 UTC
    If you want the line endings to be preserved, you can just iterate over the lines:
    for (@{$msg->body}){ s/\bblah\b/something/g; print; }

    It does nearly the same thing as your original loop, but it has a slightly different notion of where a word ends.

    If you really want whitespace delimited words you can change the regular expression to read

    s/(?<=^| )blah(?= |$)/something/g;

      s/(?<=^| )blah(?= |$)/something/g;

      I may be wrong but I thought look-behinds were fixed-width so the following gives a compilation error.

      $ perl -le ' > $str = q{adfsdheHjdGafefaffJaff}; > $str =~ s{(?<=^|[A-Z])a}{999}g; > print $str;' Variable length lookbehind not implemented in regex; marked by <-- HER +E in m/(?<=^|[A-Z])a <-- HERE / at -e line 3. $

      You could use an alternation of two look-behinds.

      $ perl -le ' > $str = q{adfsdheHjdGafefaffJaff}; > $str =~ s{(?:(?<=^)|(?<=[A-Z]))a}{999}g; > print $str;' 999dfsdheHjdG999fefaffJ999ff $

      I hope this is of interest.

      Cheers,

      JohnGG

Re: A regex question
by amarquis (Curate) on Jan 16, 2008 at 20:26 UTC

    Is there a reason not to just use substitution directly? I.e. $message_body =~ s/blah/something/g? You can fairly easily make it match whole words only, like your example, too, if you want. It seems like you are doing a bunch of work to split it up, modify it, and stitch it back together.

    Edit: As for your original question, I'm not exactly sure. It the last element split generates should have its newline intact. One issue I can see, though, is that you are prepending each line after the first with a space. Because in print "$word "; the $word contains the newline character, you've actually put a space after it. That this is a little hard to see is another reason I'd do the "Just do the substitution" method.

Re: A regex question
by ww (Archbishop) on Jan 16, 2008 at 23:53 UTC
    As a matter of curiousity, why lines 1 - 16? Is this merely a cut and paste error, or are you actually doing this?
      copy/paste error :p
Re: A regex question
by aquarium (Curate) on Jan 17, 2008 at 03:31 UTC
    it's because the 2nd "blah" in the input is really "blah\n", which matches /blah/, and gets replaced with "something ".
    the hardest line to type correctly is: stty erase ^H