I'm approaching the problem by expanding the regex into all matches, storing them in a list, and then building a regex out of that list again. This is less efficient, but much more readable to the eye. I'll go through the regex matching from left to right:

/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition .msg))|catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation. msg|i(on.msg|tion.msg))|vacation.msg)/ # first, rewrite it using the /x modifier: / ( (msg |vacation.msg ) |a (cation.msg |t (ation.msg |ition.msg ) ) |catation.msg |g |i (on.msg |tion.msg ) |msg |n.msg |on.msg |sg |t (ation.msg |i (on.msg |tion.msg ) ) |vacation.msg )/x

This looks to me like the regex does not only match vacation.msg, but some weird other permutations of it... I hope I deciphered it correctly, but we can test for that later.

Now for recreating a readable version out of that, we'll just write the list of matches down:

my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $vacation_re = join "|", @remove; $vacation_re = qr/$vacation_re/;

Now, for testing the stuff, let's try the following code:

#!/usr/bin/perl -w use strict; my $good_re = qr/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition.msg))|catation.m +sg|g|i(on.msg|tion.msg)|msg| n.msg|on.msg|sg|t(ation.msg|i(on.msg|tion.msg))|vacation.msg)/; my @remove = qw( msg vacation.msg acation.msg atation.msg atition.msg catation.msg g ion.msg ition.msg msg n.msg on.msg sg tation.msg tion.msg tition.msg vacation.msg ); my $new_re = join "|", @remove; $new_re = qr/$new_re/; while (my $line = <DATA>) { chomp $line; my ($good,$new) = ($line,$line); $good =~ s/$good_re//; $new =~ s/$new_re//; if ($good ne $new) { print "Error when testing '$line':\n"; print "Got '$new'\n"; print "Expected '$good'\n"; }; }; __DATA__ foo bar baz vacation vacation.msg vaccatiion.mmssg

You hopefully have a more complete body of messages to test against :-)


In reply to Re: Rewrite a Regular Expression to be easier to understand by Corion
in thread Rewrite a Regular Expression to be easier to understand by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.