comment on

I'm approaching the problem by expanding the regex into all matches, storing them in a list, and then building a regex out of that list again. This is less efficient, but much more readable to the eye. I'll go through the regex matching from left to right:

/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition
.msg))|catation.msg|g|i(on.msg|tion.msg)|msg|n.msg|on.msg|sg|t(ation.
msg|i(on.msg|tion.msg))|vacation.msg)/

# first, rewrite it using the /x modifier:

/
 (
  (msg
  |vacation.msg
  )
 |a
  (cation.msg
  |t
   (ation.msg
   |ition.msg
   )
  )
 |catation.msg
 |g
 |i
  (on.msg
  |tion.msg
  )
 |msg
 |n.msg
 |on.msg
 |sg
 |t
  (ation.msg
  |i
   (on.msg
   |tion.msg
   )
  )
 |vacation.msg
)/x
[download]

This looks to me like the regex does not only match vacation.msg, but some weird other permutations of it... I hope I deciphered it correctly, but we can test for that later.

Now for recreating a readable version out of that, we'll just write the list of matches down:

my @remove = qw(
msg
vacation.msg
acation.msg
atation.msg
atition.msg
catation.msg
g
ion.msg
ition.msg
msg
n.msg
on.msg
sg
tation.msg
tion.msg
tition.msg
vacation.msg
);

my $vacation_re = join "|", @remove;
$vacation_re = qr/$vacation_re/;
[download]

Now, for testing the stuff, let's try the following code:

#!/usr/bin/perl -w
use strict;

my $good_re = 

qr/((msg|vacation.msg)|a(cation.msg|t(ation.msg|ition.msg))|catation.m
+sg|g|i(on.msg|tion.msg)|msg|

n.msg|on.msg|sg|t(ation.msg|i(on.msg|tion.msg))|vacation.msg)/;

my @remove = qw(
msg
vacation.msg
acation.msg
atation.msg
atition.msg
catation.msg
g
ion.msg
ition.msg
msg
n.msg
on.msg
sg
tation.msg
tion.msg
tition.msg
vacation.msg
);

my $new_re = join "|", @remove;
$new_re = qr/$new_re/;

while (my $line = <DATA>) {
  chomp $line;
  my ($good,$new) = ($line,$line);
  $good =~ s/$good_re//;
  $new =~ s/$new_re//;

  if ($good ne $new) {
  print "Error when testing '$line':\n";
  print "Got '$new'\n";
  print "Expected '$good'\n";
  };
};
__DATA__

foo
bar
baz
vacation
vacation.msg
vaccatiion.mmssg
[download]

You hopefully have a more complete body of messages to test against :-)

In reply to Re: Rewrite a Regular Expression to be easier to understand by Corion
in thread Rewrite a Regular Expression to be easier to understand by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.