Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello again,
You guys and gals and monks have been SO helpful in the past I am turning to you again with a question.

I have the following expression :

$vacation_str =~ s/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition +.msg))|catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation. +msg|i(on.msg|tion.msg))|vacation.msg)//;
I was thinking about rewriting it to be a bit shorter and therefore easier to follow and so I tried this $vacation_str =~ s/(v??a??c??a??t??i??o??n??\.??m??s??g??)//; however it does not match anything within the real program. My test one works, but the live edition does not.
The reasoning for this is to remove a possible duplicate or partial of the phrase vacation.msg in the output that is from some buggy vendor code. They plan on fixing it eventually..... but on the whole I would like my wraper to work for it right now.

Thanks for looking this over
Jack

Replies are listed 'Best First'.
Re: Rewrite a Regular Expression to be easier to understand
by Corion (Patriarch) on Mar 04, 2004 at 20:19 UTC

    I'm approaching the problem by expanding the regex into all matches, storing them in a list, and then building a regex out of that list again. This is less efficient, but much more readable to the eye. I'll go through the regex matching from left to right:

    /((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition .msg))|catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation. msg|i(on.msg|tion.msg))|vacation.msg)/ # first, rewrite it using the /x modifier: / ( (msg |vacation.msg ) |a (cation.msg |t (ation.msg |ition.msg ) ) |catation.msg |g |i (on.msg |tion.msg ) |msg |n.msg |on.msg |sg |t (ation.msg |i (on.msg |tion.msg ) ) |vacation.msg )/x

    This looks to me like the regex does not only match vacation.msg, but some weird other permutations of it... I hope I deciphered it correctly, but we can test for that later.

    Now for recreating a readable version out of that, we'll just write the list of matches down:

    my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $vacation_re = join "|", @remove; $vacation_re = qr/$vacation_re/;

    Now, for testing the stuff, let's try the following code:

    #!/usr/bin/perl -w use strict; my $good_re = qr/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition.msg))|catation.m +sg|g|i(on.msg|tion.msg)|msg| n.msg|on.msg|sg|t(ation.msg|i(on.msg|tion.msg))|vacation.msg)/; my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $new_re = join "|", @remove; $new_re = qr/$new_re/; while (my $line = <DATA>) { chomp $line; my ($good,$new) = ($line,$line); $good =~ s/$good_re//; $new =~ s/$new_re//; if ($good ne $new) { print "Error when testing '$line':\n"; print "Got '$new'\n"; print "Expected '$good'\n"; }; }; __DATA__ foo bar baz vacation vacation.msg vaccatiion.mmssg

    You hopefully have a more complete body of messages to test against :-)

Re: Rewrite a Regular Expression to be easier to understand
by kvale (Monsignor) on Mar 04, 2004 at 20:13 UTC
    When faced with a complex regex, it is often better to generate it automatically:
    my $test = 'vacation.msg'; my @alt; foreach my $pos (1..length $test) { push @alt, substr $test, $pos; # prefix truncated push @alt, substr $test, 0, length($test) - $pos; # suffix truncate +d } my $regex = join '|', $test, @alt; $vacation_str = s/$regex//;
    You will have to alter the upper limit of the loop to determine how short of a string you want to stop at.

    Update: fixed a typo.

    -Mark

Re: Rewrite a Regular Expression to be easier to understand
by Roy Johnson (Monsignor) on Mar 04, 2004 at 22:08 UTC
    Perhaps
    s/((((((((((((v)?a)?c)?[ai])?t)?i)?o)?n)?\.)?m)?s)?g)//;
    is what you're looking for? A suffix of any length? You can make all of those open parens into non-capturing ones if you want. Update: with the a-or-i spelling variation (allowing "vacition.msg").

    The PerlMonk tr/// Advocate
Re: Rewrite a Regular Expression to be easier to understand
by ambrus (Abbot) on Mar 04, 2004 at 20:55 UTC

    The problem with your regexp may be that it matches a single period, so if whatever you want to match has a period first (before the vacation thingie), it will match only the period and delete this. Also, why are you using non-greedy ?'s here?