Yes, I didn't try to post a rigid grammar since I assumed the expression was simple and self-explanatory. But there is no substitute for unambiguous representation. I came to know about BNF for the first time through your post, thanks a lot!

Here is my attempt to describe Cisco's ACL grammar using BNF. Hope my grammar is right!

It isn’t, because you’re missing a bunch of productions.
If regexes are not the best way to decode this, how else can I do? I basically want to learn a solid way to deal with the config files, instead of using nested if's or tracking using a dodgy index variable.
Oh, regexes are a good approach. You just have to make them grammatical is all. You’ll need to be running v5.10 for that.

Here’s an example of converting your correcting BNF into a grammatical regex that at least compiles. I don’t know whether it’s right because I have no set of sample inputs from which to construct a test suite.

This first version doesn’t do any capturing, but the second one I give further on down below does.

use v5.10; my $acl_rx = qr{ (?&acl) # match one of these # according to these "regex sub" definitions: (?(DEFINE) (?<acl> access_list (?&interface_name) (?&action) (?&pr +otocol) (?&source) (?&destination) (?&port) ) (?<action> permit | deny ) (?<protocol> tcp | udp | ip | object-group (?&object_group_n +ame) ) (?<source> object-group (?&object_group_name) | host (?&ho +st_address) | (?&network_address) (?&net_mask) ) (?<destination> object-group (?&object_group_name) | host (?&ho +st_address) | (?&network_address) (?&net_mask) ) (?<port> eq (?&port_number) | range (?&low_port) (?&high +_port) | ) (?<interface_name> (?&chunk) ) (?<object_group_name> (?&chunk) ) (?<host_address> (?&chunk) ) (?<network_address> (?&chunk) ) (?<net_mask> (?&chunk) ) (?<port_number> (?&chunk) ) (?<low_port> (?&chunk) ) (?<high_port> (?&chunk) ) (?<chunk> (?&ws) \S+ (?&ws) ) (?<ws> \s* ) ) }x;
However, I much prefer this version, which uses Damian’s Regexp::Grammars module:
use Data::Dump; my $acl_grammar = do { use Regexp::Grammars; qr{ # In case you need it, uncomment this line: # <debug:on> # Match this... <acl> # According to these definitions: <rule: acl> access-list <interface_name> extended <ac +tion> <protocol> <source> <destination> <port> <comment> <rule: action> permit | deny <rule: protocol> tcp | udp | ip | object-group <object_gro +up_name> <rule: object_group> object-group <object_group_name> <rule: source> object-group <object_group_name> | host < +host_address> | <network_address> <net_mask> <rule: destination> object-group <object_group_name> | host < +host_address> | <network_address> <net_mask> <rule: port> eq <port_number> | range <low_port> <high +_port> | <rule: interface_name> <name> <rule: object_group_name> <chunk> <rule: host_address> <address> <rule: network_address> <address> <rule: net_mask> <address> <rule: port_number> <portno> <rule: low_port> <portno> <rule: high_port> <portno> <rule: address> any | <dotted_quad> | <name> <rule: portno> \d+ <token: dotted_quad> <.octet> ( <.dot> <.octet> ){3} <rule: comment> \( [^()]* \) <token: chunk> \w+ <token: octet> \d{0,3} <token: dot> \. <token: name> <.capword> ** _ <token: capword> \p{Lu} \p{Alnum}+ }x; }; while (my $input = <DATA>) { if ($input =~ $acl_grammar) { say "MATCHED"; dd \%/; # parse tree of a successful match # appears in the %/ variable } else { warn "CAN'T MATCH: $input"; } } __END__ access-list V420_IN extended permit object-group Symantec_Service_Grou +p object-group Symantec_Clients Symantec_Servers (all 3 object groups +) access-list V421_IN extended permit object-group Symantec_Service_Grou +p 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service group and tw +o network addresses) access-list V422_IN extended permit object-group Symantec_Service_Grou +p any any (Source & Destination any) access-list V423_IN extended permit tcp any any range 137 139 (with a +range of TCP ports) access-list V424_IN extended permit tcp any any eq 445 (with a single +service port)
Isn’t that splendid?

I’ve had to correct and update your grammar, but it still matches only the first two records. I’ll leave the rest as an exercise for the reader. :) Here is the output it produces:

MATCHED { "" => "access-list V420_IN extended permit object-group Symantec_Ser +vice_Group object-group Symantec_Clients Symantec_Servers (all 3 obje +ct groups)", "acl" => { "" => "access-list V420_IN extended permit object-group Symantec_S +ervice_Group object-group Symantec_Clients Symantec_Servers (all 3 ob +ject groups)", "action" => "permit", "comment" => "(all 3 object groups)", "destination" => { "" => "Clients Symantec_Servers", "net_mask" => { "" => "Symantec_Servers", "address" => { "" => "Symantec_Servers", "name" => "Symantec_S +ervers" }, }, "network_address" => { "" => "Clients", "address" => { "" => "Clients", "name" => "Clients" }, }, }, "interface_name" => { "" => "V420_IN", "name" => "V420_IN" }, "port" => "", "protocol" => { "" => "object-group Symantec_Service_Group", "object_group_name" => { "" => "Symantec_Service_Group", "chunk" + => "Symantec_Service_Group" }, }, "source" => { "" => "object-group Symantec_", "object_group_name" => { "" => "Symantec_", "chunk" => "Symantec +_" }, }, }, } MATCHED { "" => "access-list V421_IN extended permit object-group Symantec_Ser +vice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service gro +up and two network addresses)", "acl" => { "" => "access-list V421_IN extended permit object-group Symantec_S +ervice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service g +roup and two network addresses)", "action" => "permit", "comment" => "(One service group and two network addresses)", "destination" => { "" => "host 10.149.16.40", "host_address" => { "" => "10.149.16.40", "address" => { "" => "10.149.16.40", "dotted_quad" => "10.149. +16.40" }, }, }, "interface_name" => { "" => "V421_IN", "name" => "V421_IN" }, "port" => "", "protocol" => { "" => "object-group Symantec_Service_Group", "object_group_name" => { "" => "Symantec_Service_Group", "chunk" + => "Symantec_Service_Group" }, }, "source" => { "" => "10.148.0.0 255.254.0.0", "net_mask" => { "" => "255.254.0.0", "address" => { "" => "255.254.0.0", "dotted_quad" => "255.254. +0.0" }, }, "network_address" => { "" => "10.148.0.0", "address" => { "" => "10.148.0.0", "dotted_quad" => "10.148.0. +0" }, }, }, }, } CAN'T MATCH: access-list V422_IN extended permit object-group Symantec +_Service_Group any any (Source & Destination any) CAN'T MATCH: access-list V423_IN extended permit tcp any any range 137 + 139 (with a range of TCP ports) CAN'T MATCH: access-list V424_IN extended permit tcp any any eq 445 (w +ith a single service port)
Good luck!

In reply to Re^3: Regex to match a Cisco ACL by tchrist
in thread Regex to match a Cisco ACL by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.