in reply to Rewrite a Regular Expression to be easier to understand

I'm approaching the problem by expanding the regex into all matches, storing them in a list, and then building a regex out of that list again. This is less efficient, but much more readable to the eye. I'll go through the regex matching from left to right:

/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition .msg))|catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation. msg|i(on.msg|tion.msg))|vacation.msg)/ # first, rewrite it using the /x modifier: / ( (msg |vacation.msg ) |a (cation.msg |t (ation.msg |ition.msg ) ) |catation.msg |g |i (on.msg |tion.msg ) |msg |n.msg |on.msg |sg |t (ation.msg |i (on.msg |tion.msg ) ) |vacation.msg )/x

This looks to me like the regex does not only match vacation.msg, but some weird other permutations of it... I hope I deciphered it correctly, but we can test for that later.

Now for recreating a readable version out of that, we'll just write the list of matches down:

my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $vacation_re = join "|", @remove; $vacation_re = qr/$vacation_re/;

Now, for testing the stuff, let's try the following code:

#!/usr/bin/perl -w use strict; my $good_re = qr/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition.msg))|catation.m +sg|g|i(on.msg|tion.msg)|msg| n.msg|on.msg|sg|t(ation.msg|i(on.msg|tion.msg))|vacation.msg)/; my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $new_re = join "|", @remove; $new_re = qr/$new_re/; while (my $line = <DATA>) { chomp $line; my ($good,$new) = ($line,$line); $good =~ s/$good_re//; $new =~ s/$new_re//; if ($good ne $new) { print "Error when testing '$line':\n"; print "Got '$new'\n"; print "Expected '$good'\n"; }; }; __DATA__ foo bar baz vacation vacation.msg vaccatiion.mmssg

You hopefully have a more complete body of messages to test against :-)