http://qs1969.pair.com?node_id=326441


in reply to regex logical equivalence?

Before I get into demonstrating how to use YAPE::Regex::Explain I need to point out a few mistakes you're consistantly making.

.* is rarely necessary at the beginning of a RE. You probably don't need it as the first token of your RE's unless you are later using $&, or unless you're wrapping it in parens and using a $1 (etc) capturing variable. See Death to Dot Star for additional reading on this subject.

? is not a non-greedy substitute for *. *? is the nongreedy zero-or-more quantifier.

The * quantifier will allow empty strings to match. In other words, "\w*" will match one, two, hundreds, thousands of word characters, but it will also match no characters at all. Is this what you want? Maybe you want the + quantifier instead.

Ok, here we go again with the deciphering. This time I'm not going to do it by hand, but rather will demonstrate effective use of a great module:

use strict; use warnings; use YAPE::Regex::Explain; #my $exp = YAPE::Regex::Explain->new($REx)->explain; my $rex1 = qr/.*(\[*\w*\@*\-*\w*[$ #\%>~]\]|\\\[\\e\[0m\\\] \[0m)\s?/; my $rex2 = qr/.*([$ #\%>~]|\[*\w*\@*\-*\w*\%\]*|\[*\w*\@*\-*\w*#\]*|\[ +*\w*\@*\-*\w*\$\]*|\[*\w*\@*\-*\w*>\]*|\\\[\\e\[0m\\\] \[0m)\s?/; print YAPE::Regex::Explain->new($rex1)->explain; print YAPE::Regex::Explain->new($rex2)->explain; __OUTPUT__
The regular expression: (?-imsx:.*(\[*\w*\@*\-*\w*[$ #%>~]\]|\\\[\\e\[0m\\\] \[0m)\s?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [$ #%>~] any character of: '$', ' ', '#', '%', '>', '~' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- e 'e' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s? whitespace (\n, \r, \t, \f, and " ") (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- The regular expression: (?-imsx:.*([$ #%>~]|\[*\w*\@*\-*\w*%\]*|\[*\w*\@*\-*\w*#\]*|\[*\w*\@*\ +-*\w*\$\]*|\[*\w*\@*\-*\w*>\]*|\\\[\\e\[0m\\\] \[0m)\s?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [$ #%>~] any character of: '$', ' ', '#', '%', '>', '~' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- % '%' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- # '#' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- e 'e' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s? whitespace (\n, \r, \t, \f, and " ") (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

It doesn't look to me like they're completely equivilant.

Also, please take an hour or so and read through perlrequick, perlretut and perlre. Until you've devoured those POD's you're going to be grasping at straws with regular expressions. If you really want to learn them inside and out, beg, buy, borrow, or steal (ok, don't steal) the Owls book, by Jeffrey Friedl, Mastering Regular Expressions. It's an O'Reilly book, and probably the best book ever written on regexps.

Updated: Added link, suggested by broquaint.


Dave