comment on

I think you might find it worthwhile to learn about /x at this point as these regular expressions could certainly do with some commenting. /x isn't hard or scary at all. All you have to do is rememeber to escape the whitespace you want and the #s. It makes regular expressions much easier to explain.

Just to make things more confusing ;) I'm going to swap the order of these two expressions, so my first one will be the longer of the two and I'll work on the the shorter (my second - your first) as I think that's the one you wanted to focus on.

To determine if your two regular expressions are suffiently equivalent we need to compare them.

This is the longer one:

/.*                      # Stuff
 (                        # START capturing to $1 
    [$\ \#\%>~]           # Any single space, $, #, %, > or ~
   |                      # OR
    \[*                   # 0 or more [s
    \w*                   # 0 or more word characters (a-zA-Z0-9_) 
    \@*                   # 0 or more @s
    \-*                   # 0 or more -s
    \w*                   # more word characters
    \%                    # Exactly 1 %
    \]*                   # 0 or more ]s
   |                      # OR
    \[*\w*\@*\-*\w*\#\]*  # As above, but with a # instead of %
   |                      # OR
    \[*\w*\@*\-*\w*\$\]*  # As above, but with a $
   |                      # OR
    \[*\w*\@*\-*\w*>\]*   # As above, but with no terminator 
                          # (will therefore match any terminator)
   |                      # OR
    \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m
 )                        # END of $1
 \s?                      # 0 or 1 spaces
/x
[download]

and this the shorter:

/.*                      # Stuff
 (                        # START capturing to $1 
    \[*                   # 0 or more [s
    \w*                   # 0 or more word characters
    \@*                   # 0 or more @s
    \-*                   # 0 or more -s
    \w*                   # more word characters
    [$\ \#\%>~]           # exactly 1 space, $, #, %, > or ~
    \]                    # exactly 1 ] (are you missing a * ?)
  |                       # OR 
    \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m
 )                        # END of $1
\s?                      # 0 or 1 spaces
/x
[download]

Now we need to consider what patterns will match one, but not the other... (I'm going to assume you are missing a * up there next to your ], if not, then these aren't very equivalent at all).

Any 1 space, $, #, %, > or ~ will be matched by both.
The escape sequence: \[\e[0m\\] [0m is allowed by both.
Each pattern: [w@-w$], [w@-w#], [w@-w%], [w@-w~] is allowed by both.
[w@-w ] is (as you shown) is allowed by the second but not the first (this is easy to fix)

Like you, I can only spot this one significant difference between the two regular expressions (once you fix your typo).

This is easily fixed:

/.*                      # Stuff
 (                        # START capturing to $1 
    \                     # exactly 1 space
  |                       # OR
    \[*                   # 0 or more [s
    \w*                   # 0 or more word characters
    \@*                   # 0 or more @s
    \-*                   # 0 or more -s
    \w*                   # more word characters
    [$\#\%>~]              # exactly 1 of $, #, %, > or ~
    \]*                   # 0 or more ]s
  |                       # OR 
    \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m
 )                        # END of $1
\s?                      # 0 or 1 spaces
/x
[download]

Note that this equivalence won't necessarily remain true if you change your quantifiers. In particular if you change all of your *s to ?s. If you want my opinion I suspect you're actually looking more for a regular expression like this:

/.*                      # Stuff
 (                        # START capturing to $1 
    \                     # exactly 1 space
  |                       # OR
    \[?                   # 0 or 1 [
    \w*                   # 0 or more word characters
    \@?                   # 0 or 1 @
    [-\w.]*               # 0 or more word chars, dots and hyphens eg 
+w-w.w-.w
    [$\#\%>~]             # exactly 1 of $, #, %, > or ~
    \]?                   # 0 or 1 ]
  |                       # OR 
    \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m
 )                        # END of $1
\s?                      # 0 or 1 spaces
/x
[download]

But I may be wrong - you may not be interested in the dot at all. ;) I'm not 100% certain that you want the .* at the front though. Do you have some sample data for us?

I hope you recognise that both expressions will match any string with a single space in it... which will be most strings....

I hope this helps.

jarich

In reply to Re: regex logical equivalence? by jarich
in thread regex logical equivalence? by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks