Re: Problem with alternating regex?

Here's my take on a solution:

#!/usr/bin/env perl

use strict;
use warnings;

my $re = qr{ set \s zone \s (?> id \s \d+ \s | ) \" ( [^"]+ ) }x;
my $out_format = "Config line=> %s;    Value=> %s;    zone=> %s\n";

while (<DATA>) {
    next unless /$re/;
    chomp;
    printf $out_format => $., $_, $1;
}

__DATA__
set zone "VLAN" vrouter "trust-vr"
set zone id 100 "Internet_Only"
[download]

Output:

$ pm_pref_quote_regex.pl
Config line=> 1;    Value=> set zone "VLAN" vrouter "trust-vr";    zon
+e=> VLAN
Config line=> 2;    Value=> set zone id 100 "Internet_Only";    zone=>
+ Internet_Only
[download]

Note that I've used the (?> ... ) construct - documented in perlre - Extended Patterns. Use of this construct for alternations is a Perl Best Practices recommendation (which may, or may not, be important to you).

I also added some additional lines to test for skipped (i.e. not matched) input and arbitrary surrounding text:

__DATA__
set zone "VLAN" vrouter "trust-vr"
set zone id 100 "Internet_Only"
blah
blah blah set zone "extra" whatever
blah blah blah "set zone id 12345 "extra2_a" something "extra2_b"
[download]

These tests were successful:

$ pm_pref_quote_regex.pl
Config line=> 1;    Value=> set zone "VLAN" vrouter "trust-vr";    zon
+e=> VLAN
Config line=> 2;    Value=> set zone id 100 "Internet_Only";    zone=>
+ Internet_Only
Config line=> 4;    Value=> blah blah set zone "extra" whatever;    zo
+ne=> extra
Config line=> 5;    Value=> blah blah blah "set zone id 12345 "extra2_
+a" something "extra2_b";    zone=> extra2_a
[download]

You may be interested in Regexp::Debugger. This tool provides a visualisation of your regex in action. It is very easy to use: just add use Regexp::Debugger; near the start of your code and run your script.

Another tool is YAPE::Regex::Explain. However, do be aware of its limitations: "There is no support for regular expression syntax added after Perl version 5.6, ...". Using this on your supplied regex produces the following (somewhat lengthy) output:

$ perl -MYAPE::Regex::Explain -e '
my $re = q{^set\szone\s("([^"]*)"|id\s\d+\s"([^"]*)")};
print YAPE::Regex::Explain->new($re)->explain;
'
The regular expression:

(?-imsx:^set\szone\s("([^"]*)"|id\s\d+\s"([^"]*)"))

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  set                      'set'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  zone                     'zone'
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      [^"]*                    any character except: '"' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    id                       'id'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      [^"]*                    any character except: '"' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

-- Ken

Comment on Re: Problem with alternating regex? Select or Download Code