comment on

YAPE::Regex::Explain is very helpful when debugging a regular expression. You can use it like this:

use strict;
use warnings;
use YAPE::Regex::Explain;

my $regexp = qr/^(.*?)((=<)|[<=>])(.*)/;
my $exp = YAPE::Regex::Explain->new($regexp);
print $exp->explain;
[download]

The output is as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      =<                       '=<'
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [<=>]                    any character of: '<', '=', '>'
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

Another useful tool is to use the 'x' modifier to allow whitespace in the regex. I consider regex to be an extremely dense programming language, and without the whitespace to organize your thoughts it is very easy to get lost in the noise.

Here is your regex, using the 'x' modifier:

my $re = qr{
    ^(.*?)
    (
      (=<)
      |
      [<=>]
    )
    (.*)
}x;
[download]

When writing a large regex it is a tradeoff between accuracy and readability. It is sometimes tempting to keep it simple so the regex is maintainable. 'x' is useful for addressing this problem, as you can put comments in the regex. Here is a revised regex for you:

my $re = qr{
    ^               # Beginning of line
    \s*             # Optional whitespace
    ([a-zA-Z0-9_]+) # Capture(1) Alphanumeric LHS
    \s*             # Optional whitespace
    (               # Capture(2) either:
        [<>!]=      #   <=, >=, !=
    |
        [<>=]       #   <, >, =
    )
    \s*             # Optional whitespace
    ([a-zA-Z0-9_]+) # Capture(3) Alphanumeric RHS
}x;
[download]

And if you would like to make it more readable you can separate some of the tokens into other variables, like this:

my $operand  = '[a-zA-Z0-9_]+' ;
my $re = qr{
    \A              # Beginning of line
    \s*             # Optional whitespace
    ($operand)      # Capture(1) Alphanumeric LHS
    \s*             # Optional whitespace
    (               # Capture(2) either:
        [<>!]=      #   <=, >=, !=
    |
        [<>=]       #   <, >, =
    )
    \s*             # Optional whitespace
    ($operand)      # Capture(3) Alphanumeric LHS
}x;
[download]

I noticed that your example input allowed '=>' instead of '>=', maybe in your locale that is allowed?

Here is a functional test script for you. It matches the items documented in the regex, but does not match '=>' or '=<' (Is that allowed in your locale?)

use strict;
use warnings;

my $operand  = '[a-zA-Z0-9_]+' ;

my $re = qr{
    ^               # Beginning of string
    \s*             # Optional whitespace
    ($operand)      # Capture(1) Alphanumeric LHS
    \s*             # Optional whitespace
    (               # Capture(2) either:
        [<>!]=      #   <=, >=, !=
    |
        [<>=]       #   <, >, =
    )
    \s*             # Optional whitespace
    ($operand)      # Capture(1) Alphanumeric LHS
}x;

while (my $line = <DATA>) {
    my ($lhs,$operator,$rhs) = $line =~ $re;
    if ($line =~ $re) {
      my ($lhs,$operator,$rhs) = ($1,$2,$3);
      printf "  (%4s) (%2s) (%4s)\n", $lhs, $operator, $rhs;
    }
}

__DATA__
a=b
a!=b
a<b
a>b
a=>b
a=<b
[download]

In reply to Re: Regular Expressions by imp
in thread Regular Expressions by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.