in reply to Re^2: Regex to match a Cisco ACL
in thread Regex to match a Cisco ACL

Yes, I didn't try to post a rigid grammar since I assumed the expression was simple and self-explanatory. But there is no substitute for unambiguous representation. I came to know about BNF for the first time through your post, thanks a lot!

Here is my attempt to describe Cisco's ACL grammar using BNF. Hope my grammar is right!

It isn’t, because you’re missing a bunch of productions.
If regexes are not the best way to decode this, how else can I do? I basically want to learn a solid way to deal with the config files, instead of using nested if's or tracking using a dodgy index variable.
Oh, regexes are a good approach. You just have to make them grammatical is all. You’ll need to be running v5.10 for that.

Here’s an example of converting your correcting BNF into a grammatical regex that at least compiles. I don’t know whether it’s right because I have no set of sample inputs from which to construct a test suite.

This first version doesn’t do any capturing, but the second one I give further on down below does.

use v5.10; my $acl_rx = qr{ (?&acl) # match one of these # according to these "regex sub" definitions: (?(DEFINE) (?<acl> access_list (?&interface_name) (?&action) (?&pr +otocol) (?&source) (?&destination) (?&port) ) (?<action> permit | deny ) (?<protocol> tcp | udp | ip | object-group (?&object_group_n +ame) ) (?<source> object-group (?&object_group_name) | host (?&ho +st_address) | (?&network_address) (?&net_mask) ) (?<destination> object-group (?&object_group_name) | host (?&ho +st_address) | (?&network_address) (?&net_mask) ) (?<port> eq (?&port_number) | range (?&low_port) (?&high +_port) | ) (?<interface_name> (?&chunk) ) (?<object_group_name> (?&chunk) ) (?<host_address> (?&chunk) ) (?<network_address> (?&chunk) ) (?<net_mask> (?&chunk) ) (?<port_number> (?&chunk) ) (?<low_port> (?&chunk) ) (?<high_port> (?&chunk) ) (?<chunk> (?&ws) \S+ (?&ws) ) (?<ws> \s* ) ) }x;
However, I much prefer this version, which uses Damian’s Regexp::Grammars module:
use Data::Dump; my $acl_grammar = do { use Regexp::Grammars; qr{ # In case you need it, uncomment this line: # <debug:on> # Match this... <acl> # According to these definitions: <rule: acl> access-list <interface_name> extended <ac +tion> <protocol> <source> <destination> <port> <comment> <rule: action> permit | deny <rule: protocol> tcp | udp | ip | object-group <object_gro +up_name> <rule: object_group> object-group <object_group_name> <rule: source> object-group <object_group_name> | host < +host_address> | <network_address> <net_mask> <rule: destination> object-group <object_group_name> | host < +host_address> | <network_address> <net_mask> <rule: port> eq <port_number> | range <low_port> <high +_port> | <rule: interface_name> <name> <rule: object_group_name> <chunk> <rule: host_address> <address> <rule: network_address> <address> <rule: net_mask> <address> <rule: port_number> <portno> <rule: low_port> <portno> <rule: high_port> <portno> <rule: address> any | <dotted_quad> | <name> <rule: portno> \d+ <token: dotted_quad> <.octet> ( <.dot> <.octet> ){3} <rule: comment> \( [^()]* \) <token: chunk> \w+ <token: octet> \d{0,3} <token: dot> \. <token: name> <.capword> ** _ <token: capword> \p{Lu} \p{Alnum}+ }x; }; while (my $input = <DATA>) { if ($input =~ $acl_grammar) { say "MATCHED"; dd \%/; # parse tree of a successful match # appears in the %/ variable } else { warn "CAN'T MATCH: $input"; } } __END__ access-list V420_IN extended permit object-group Symantec_Service_Grou +p object-group Symantec_Clients Symantec_Servers (all 3 object groups +) access-list V421_IN extended permit object-group Symantec_Service_Grou +p 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service group and tw +o network addresses) access-list V422_IN extended permit object-group Symantec_Service_Grou +p any any (Source & Destination any) access-list V423_IN extended permit tcp any any range 137 139 (with a +range of TCP ports) access-list V424_IN extended permit tcp any any eq 445 (with a single +service port)
Isn’t that splendid?

I’ve had to correct and update your grammar, but it still matches only the first two records. I’ll leave the rest as an exercise for the reader. :) Here is the output it produces:

MATCHED { "" => "access-list V420_IN extended permit object-group Symantec_Ser +vice_Group object-group Symantec_Clients Symantec_Servers (all 3 obje +ct groups)", "acl" => { "" => "access-list V420_IN extended permit object-group Symantec_S +ervice_Group object-group Symantec_Clients Symantec_Servers (all 3 ob +ject groups)", "action" => "permit", "comment" => "(all 3 object groups)", "destination" => { "" => "Clients Symantec_Servers", "net_mask" => { "" => "Symantec_Servers", "address" => { "" => "Symantec_Servers", "name" => "Symantec_S +ervers" }, }, "network_address" => { "" => "Clients", "address" => { "" => "Clients", "name" => "Clients" }, }, }, "interface_name" => { "" => "V420_IN", "name" => "V420_IN" }, "port" => "", "protocol" => { "" => "object-group Symantec_Service_Group", "object_group_name" => { "" => "Symantec_Service_Group", "chunk" + => "Symantec_Service_Group" }, }, "source" => { "" => "object-group Symantec_", "object_group_name" => { "" => "Symantec_", "chunk" => "Symantec +_" }, }, }, } MATCHED { "" => "access-list V421_IN extended permit object-group Symantec_Ser +vice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service gro +up and two network addresses)", "acl" => { "" => "access-list V421_IN extended permit object-group Symantec_S +ervice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service g +roup and two network addresses)", "action" => "permit", "comment" => "(One service group and two network addresses)", "destination" => { "" => "host 10.149.16.40", "host_address" => { "" => "10.149.16.40", "address" => { "" => "10.149.16.40", "dotted_quad" => "10.149. +16.40" }, }, }, "interface_name" => { "" => "V421_IN", "name" => "V421_IN" }, "port" => "", "protocol" => { "" => "object-group Symantec_Service_Group", "object_group_name" => { "" => "Symantec_Service_Group", "chunk" + => "Symantec_Service_Group" }, }, "source" => { "" => "10.148.0.0 255.254.0.0", "net_mask" => { "" => "255.254.0.0", "address" => { "" => "255.254.0.0", "dotted_quad" => "255.254. +0.0" }, }, "network_address" => { "" => "10.148.0.0", "address" => { "" => "10.148.0.0", "dotted_quad" => "10.148.0. +0" }, }, }, }, } CAN'T MATCH: access-list V422_IN extended permit object-group Symantec +_Service_Group any any (Source & Destination any) CAN'T MATCH: access-list V423_IN extended permit tcp any any range 137 + 139 (with a range of TCP ports) CAN'T MATCH: access-list V424_IN extended permit tcp any any eq 445 (w +ith a single service port)
Good luck!

Replies are listed 'Best First'.
Re^4: Regex to match a Cisco ACL
by JavaFan (Canon) on May 22, 2011 at 20:08 UTC
    One of the problems with taking an existing BNF and turning it into a regular expression is that the most BNFs assume an implicite tokenization. Pattern matching doesn't tokenize. For instance, "access_listfoopermitiphost123host456eq789" is matched by your pattern.

    For the Cisco ACL, one might get away with requiring whitespace between each token, but in general, that will not work.

Re^4: Regex to match a Cisco ACL
by Anonymous Monk on May 22, 2011 at 15:57 UTC
Re^4: Regex to match a Cisco ACL
by Anonymous Monk on May 22, 2011 at 16:17 UTC

    Amazing! So if I could write down my grammar accurately, bulk of the job is done. But for some reason it is not matching anything. Does it have to match all the tokens in order to have a successful match? I am suspicious of the grammar that I provided.

    Any way here is the data you can test with (same as the one successfully used by CountZero for his test case):

    access-list V420_IN extended permit object-group Symantec_Service_Grou +p 10.148.0.0 255.254.0.0 host 10.149.16.40 access-list V420_IN extended permit object-group Symantec_Service_Grou +p any any access-list V420_IN extended permit tcp any any range 137 139 access-list V420_IN extended permit tcp any any eq 445

    I note that I didn't mention the possibility of 'any' for source & destination in my BNF grammar. It should be a simple addition though. But I wonder why even the first rule cannot be parsed. Here is the output with debug on

    ===================> Trying <grammar> from position 0 access-list V420_IN |...Trying <acl> | \FAIL <acl> \FAIL <grammar> ===================> Trying <grammar> from position 1 ccess-list V420_IN e |...Trying <acl> | \FAIL <acl> \FAIL <grammar> ===================> Trying <grammar> from position 2 cess-list V420_IN ex |...Trying <acl> | \FAIL <acl>
    ... and so on till the last line.
      Try the updated version. It uses the test data. There are still grammar bugs, because the last 3 fail, although the first 2 succeed.

      I think you just need to get the grammar straight.

        Thanks a bunch! Your updated code is indeed amazing, for its level of detail and re-usable tokens. It just gives a greater level of control and granularity over matching schema without having to resort to a code that looks like hieroglyphics.

        In hindsight, it was silly of me to ask again without checking the grammar. The last 3 rules failing is no surprise since the grammar doesn't account for it yet. I should produce a complete code soon, for the sake of others looking for the same stuff.

        May I ask whether it is possible to code multi-line grammar also in to regex? For e.g the below section of config file

        object-group network NOC-NC-NC1 network-object 192.162.137.0 255.255.255.0 network-object 192.162.146.0 255.255.255.0 object-group network NOC-NC-NC2 network-object 192.162.131.0 255.255.255.0 network-object 192.162.134.0 255.255.255.0

        Should be read into a hash of arrays as:

        $object_hash{"NOC-NC-NC1"} => ["192.162.137.0 255.255.255.0","192.162. +146.0 255.255.255.0"] $object_hash{"NOC-NC-NC2"} => ["192.162.131.0 255.255.255.0","192.162. +134.0 255.255.255.0"]