Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: regex logical equivalence?

by davido (Cardinal)
on Feb 04, 2004 at 09:17 UTC ( #326441=note: print w/replies, xml ) Need Help??


in reply to regex logical equivalence?

Before I get into demonstrating how to use YAPE::Regex::Explain I need to point out a few mistakes you're consistantly making.

.* is rarely necessary at the beginning of a RE. You probably don't need it as the first token of your RE's unless you are later using $&, or unless you're wrapping it in parens and using a $1 (etc) capturing variable. See Death to Dot Star for additional reading on this subject.

? is not a non-greedy substitute for *. *? is the nongreedy zero-or-more quantifier.

The * quantifier will allow empty strings to match. In other words, "\w*" will match one, two, hundreds, thousands of word characters, but it will also match no characters at all. Is this what you want? Maybe you want the + quantifier instead.

Ok, here we go again with the deciphering. This time I'm not going to do it by hand, but rather will demonstrate effective use of a great module:

use strict; use warnings; use YAPE::Regex::Explain; #my $exp = YAPE::Regex::Explain->new($REx)->explain; my $rex1 = qr/.*(\[*\w*\@*\-*\w*[$ #\%>~]\]|\\\[\\e\[0m\\\] \[0m)\s?/; my $rex2 = qr/.*([$ #\%>~]|\[*\w*\@*\-*\w*\%\]*|\[*\w*\@*\-*\w*#\]*|\[ +*\w*\@*\-*\w*\$\]*|\[*\w*\@*\-*\w*>\]*|\\\[\\e\[0m\\\] \[0m)\s?/; print YAPE::Regex::Explain->new($rex1)->explain; print YAPE::Regex::Explain->new($rex2)->explain; __OUTPUT__
The regular expression: (?-imsx:.*(\[*\w*\@*\-*\w*[$ #%>~]\]|\\\[\\e\[0m\\\] \[0m)\s?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [$ #%>~] any character of: '$', ' ', '#', '%', '>', '~' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- e 'e' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s? whitespace (\n, \r, \t, \f, and " ") (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- The regular expression: (?-imsx:.*([$ #%>~]|\[*\w*\@*\-*\w*%\]*|\[*\w*\@*\-*\w*#\]*|\[*\w*\@*\ +-*\w*\$\]*|\[*\w*\@*\-*\w*>\]*|\\\[\\e\[0m\\\] \[0m)\s?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [$ #%>~] any character of: '$', ' ', '#', '%', '>', '~' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- % '%' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- # '#' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \$ '$' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \[* '[' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@* '@' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \-* '-' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w* word characters (a-z, A-Z, 0-9, _) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- \]* ']' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- e 'e' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- ' ' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- 0m '0m' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s? whitespace (\n, \r, \t, \f, and " ") (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

It doesn't look to me like they're completely equivilant.

Also, please take an hour or so and read through perlrequick, perlretut and perlre. Until you've devoured those POD's you're going to be grasping at straws with regular expressions. If you really want to learn them inside and out, beg, buy, borrow, or steal (ok, don't steal) the Owls book, by Jeffrey Friedl, Mastering Regular Expressions. It's an O'Reilly book, and probably the best book ever written on regexps.

Updated: Added link, suggested by broquaint.


Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://326441]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2022-05-16 08:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (62 votes). Check out past polls.

    Notices?