A regex question

vxp has asked for the wisdom of the Perl Monks concerning the following question:

First things first: What I am trying to do is create a very simple postfix filter. it'll examine each message and upon finding a word, it will replace that word with another word, and inject the modified email back into the postfix queue. Here's what I have done so far:

#!/usr/bin/perl

use Mail::Internet;
$msg = Mail::Internet->new([ <> ]);

$to = $msg->get('To');

#$content = join( '',@{$msg->body} );

@content = @{$msg->body};

print "To: " . $to;

foreach $line (@content) {
        @words = split(/ /, $line);
        foreach $word (@words) {
                if ($word =~ /blah/) {
                        $word = "something";
                }
                print "$word ";
        }
}
[download]

So, upon finding "blah" it will replace it with "something". here's a test email that I fed into it:

From root@val.vmsinfo.com  Fri Dec 14 14:54:57 2007
Return-Path: <root@val.vmsinfo.com>
X-Original-To: vxp
Delivered-To: vxp@val.vmsinfo.com
Received: by val.vmsinfo.com (Postfix, from userid 0)
        id 86A085FD705; Fri, 14 Dec 2007 14:54:57 -0500 (EST)
To: vxp@val.vmsinfo.com
Subject: hi
Message-Id: <20071214195457.86A085FD705@val.vmsinfo.com>
Date: Fri, 14 Dec 2007 14:54:57 -0500 (EST)
From: root@val.vmsinfo.com (root)
Status: O
X-Status: 
X-Keywords:                  
X-UID: 7

hi Val blah some more blah
..
[download]

and here's the resulting output:

[vxp@val ~]$ ./mail.pl test.txt
To: vxp@val.vmsinfo.com
hi Val something some more something ..
 [vxp@val ~]$
[download]

Now, as you see, the "\n" before the second line was lost in the output: the ".." is now on the first line.. Any suggestions ? :)

Comment on A regex question Select or Download Code

Replies are listed 'Best First'.
Re: A regex question by moritz (Cardinal) on Jan 16, 2008 at 20:27 UTC
If you want the line endings to be preserved, you can just iterate over the lines: `for (@{$msg->body}){ s/\bblah\b/something/g; print; }` [download] It does nearly the same thing as your original loop, but it has a slightly different notion of where a word ends. If you really want whitespace delimited words you can change the regular expression to read `s/(?<=^\| )blah(?= \|$)/something/g;`	[reply] [d/l] [select]
Re^2: A regex question by johngg (Canon) on Jan 16, 2008 at 23:15 UTC
`s/(?<=^\| )blah(?= \|$)/something/g;` I may be wrong but I thought look-behinds were fixed-width so the following gives a compilation error. `$ perl -le ' > $str = q{adfsdheHjdGafefaffJaff}; > $str =~ s{(?<=^\|[A-Z])a}{999}g; > print $str;' Variable length lookbehind not implemented in regex; marked by <-- HER +E in m/(?<=^\|[A-Z])a <-- HERE / at -e line 3. $` [download] You could use an alternation of two look-behinds. `$ perl -le ' > $str = q{adfsdheHjdGafefaffJaff}; > $str =~ s{(?:(?<=^)\|(?<=[A-Z]))a}{999}g; > print $str;' 999dfsdheHjdG999fefaffJ999ff $` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re: A regex question by amarquis (Curate) on Jan 16, 2008 at 20:26 UTC
Is there a reason not to just use substitution directly? I.e. `$message_body =~ s/blah/something/g`? You can fairly easily make it match whole words only, like your example, too, if you want. It seems like you are doing a bunch of work to split it up, modify it, and stitch it back together. Edit: As for your original question, I'm not exactly sure. It the last element split generates should have its newline intact. One issue I can see, though, is that you are prepending each line after the first with a space. Because in `print "$word ";` the $word contains the newline character, you've actually put a space after it. That this is a little hard to see is another reason I'd do the "Just do the substitution" method.	[reply] [d/l] [select]
Re: A regex question by ww (Archbishop) on Jan 16, 2008 at 23:53 UTC
As a matter of curiousity, why lines 1 - 16? Is this merely a cut and paste error, or are you actually doing this?	[reply]
Re^2: A regex question by vxp (Pilgrim) on Jan 17, 2008 at 00:17 UTC
copy/paste error :p	[reply]
Re: A regex question by aquarium (Curate) on Jan 17, 2008 at 03:31 UTC
it's because the 2nd "blah" in the input is really "blah\n", which matches /blah/, and gets replaced with "something ". the hardest line to type correctly is: stty erase ^H	[reply]