Yes, I didn't try to post a rigid grammar since I assumed the expression was simple and self-explanatory. But there is no substitute for unambiguous representation. I came to know about BNF for the first time through your post, thanks a lot! Here is my attempt to describe Cisco's ACL grammar using BNF. Hope my grammar is right!
<acl> ::= "access-list" <interface-name> <action> <protocol> <sou
+rce> <destination> <port>
<action> ::= "permit" | "deny"
<protocol> ::= "tcp" | "udp" | "ip" | "object-group" <object-group-
+name>
<source> ::= "object-group" <object-group-name> | "host" <host-ad
+dress> | <network-address> <net-mask>
<destination> ::= "object-group" <object-group-name> | "host" <host-ad
+dress> | <network-address> <net-mask>
<port> ::= "eq" <port-number> | "range" <low-port> <high-port>
+| ""
The strings that I want to capture in my code are <action>, <protocol>, <source>, <destination>, <port>. Sorry for my previous ambiguous post.
If regexes are not the best way to decode this, how else can I do? I basically want to learn a solid way to deal with the config files, instead of using nested if's or tracking using a dodgy index variable. | [reply] [d/l] |
Yes, I didn't try to post a rigid grammar since I assumed the expression was simple and self-explanatory. But there is no substitute for unambiguous representation. I came to know about BNF for the first time through your post, thanks a lot!
Here is my attempt to describe Cisco's ACL grammar using BNF. Hope my grammar is right!
It isn’t, because you’re missing a bunch of productions.
If regexes are not the best way to decode this, how else can I do? I basically want to learn a solid way to deal with the config files, instead of using nested if's or tracking using a dodgy index variable.
Oh, regexes are a good approach. You just have to make them grammatical is all. You’ll need to be running v5.10 for that.
Here’s an example of converting your correcting BNF into a grammatical regex that at least compiles. I don’t know whether it’s right because I have no set of sample inputs from which to construct a test suite. This first version doesn’t do any capturing, but the second one I give further on down below does.
use v5.10;
my $acl_rx = qr{
(?&acl) # match one of these
# according to these "regex sub" definitions:
(?(DEFINE)
(?<acl> access_list (?&interface_name) (?&action) (?&pr
+otocol) (?&source) (?&destination) (?&port) )
(?<action> permit | deny )
(?<protocol> tcp | udp | ip | object-group (?&object_group_n
+ame) )
(?<source> object-group (?&object_group_name) | host (?&ho
+st_address) | (?&network_address) (?&net_mask) )
(?<destination> object-group (?&object_group_name) | host (?&ho
+st_address) | (?&network_address) (?&net_mask) )
(?<port> eq (?&port_number) | range (?&low_port) (?&high
+_port) | )
(?<interface_name> (?&chunk) )
(?<object_group_name> (?&chunk) )
(?<host_address> (?&chunk) )
(?<network_address> (?&chunk) )
(?<net_mask> (?&chunk) )
(?<port_number> (?&chunk) )
(?<low_port> (?&chunk) )
(?<high_port> (?&chunk) )
(?<chunk> (?&ws) \S+ (?&ws) )
(?<ws> \s* )
)
}x;
However, I much prefer this version, which uses Damian’s Regexp::Grammars module:
use Data::Dump;
my $acl_grammar = do {
use Regexp::Grammars;
qr{
# In case you need it, uncomment this line:
# <debug:on>
# Match this...
<acl>
# According to these definitions:
<rule: acl> access-list <interface_name> extended <ac
+tion> <protocol> <source> <destination> <port> <comment>
<rule: action> permit | deny
<rule: protocol> tcp | udp | ip | object-group <object_gro
+up_name>
<rule: object_group> object-group <object_group_name>
<rule: source> object-group <object_group_name> | host <
+host_address> | <network_address> <net_mask>
<rule: destination> object-group <object_group_name> | host <
+host_address> | <network_address> <net_mask>
<rule: port> eq <port_number> | range <low_port> <high
+_port> |
<rule: interface_name> <name>
<rule: object_group_name> <chunk>
<rule: host_address> <address>
<rule: network_address> <address>
<rule: net_mask> <address>
<rule: port_number> <portno>
<rule: low_port> <portno>
<rule: high_port> <portno>
<rule: address> any | <dotted_quad> | <name>
<rule: portno> \d+
<token: dotted_quad> <.octet> ( <.dot> <.octet> ){3}
<rule: comment> \( [^()]* \)
<token: chunk> \w+
<token: octet> \d{0,3}
<token: dot> \.
<token: name> <.capword> ** _
<token: capword> \p{Lu} \p{Alnum}+
}x;
};
while (my $input = <DATA>) {
if ($input =~ $acl_grammar) {
say "MATCHED";
dd \%/; # parse tree of a successful match
# appears in the %/ variable
} else {
warn "CAN'T MATCH: $input";
}
}
__END__
access-list V420_IN extended permit object-group Symantec_Service_Grou
+p object-group Symantec_Clients Symantec_Servers (all 3 object groups
+)
access-list V421_IN extended permit object-group Symantec_Service_Grou
+p 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service group and tw
+o network addresses)
access-list V422_IN extended permit object-group Symantec_Service_Grou
+p any any (Source & Destination any)
access-list V423_IN extended permit tcp any any range 137 139 (with a
+range of TCP ports)
access-list V424_IN extended permit tcp any any eq 445 (with a single
+service port)
Isn’t that splendid?
I’ve had to correct and update your grammar, but it still matches only the first two records. I’ll leave the rest as an exercise for the reader. :) Here is the output it produces:
MATCHED
{
"" => "access-list V420_IN extended permit object-group Symantec_Ser
+vice_Group object-group Symantec_Clients Symantec_Servers (all 3 obje
+ct groups)",
"acl" => {
"" => "access-list V420_IN extended permit object-group Symantec_S
+ervice_Group object-group Symantec_Clients Symantec_Servers (all 3 ob
+ject groups)",
"action" => "permit",
"comment" => "(all 3 object groups)",
"destination" => {
"" => "Clients Symantec_Servers",
"net_mask" => {
"" => "Symantec_Servers",
"address" => { "" => "Symantec_Servers", "name" => "Symantec_S
+ervers" },
},
"network_address" => {
"" => "Clients",
"address" => { "" => "Clients", "name" => "Clients" },
},
},
"interface_name" => { "" => "V420_IN", "name" => "V420_IN" },
"port" => "",
"protocol" => {
"" => "object-group Symantec_Service_Group",
"object_group_name" => { "" => "Symantec_Service_Group", "chunk"
+ => "Symantec_Service_Group" },
},
"source" => {
"" => "object-group Symantec_",
"object_group_name" => { "" => "Symantec_", "chunk" => "Symantec
+_" },
},
},
}
MATCHED
{
"" => "access-list V421_IN extended permit object-group Symantec_Ser
+vice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service gro
+up and two network addresses)",
"acl" => {
"" => "access-list V421_IN extended permit object-group Symantec_S
+ervice_Group 10.148.0.0 255.254.0.0 host 10.149.16.40 (One service g
+roup and two network addresses)",
"action" => "permit",
"comment" => "(One service group and two network addresses)",
"destination" => {
"" => "host 10.149.16.40",
"host_address" => {
"" => "10.149.16.40",
"address" => { "" => "10.149.16.40", "dotted_quad" => "10.149.
+16.40" },
},
},
"interface_name" => { "" => "V421_IN", "name" => "V421_IN" },
"port" => "",
"protocol" => {
"" => "object-group Symantec_Service_Group",
"object_group_name" => { "" => "Symantec_Service_Group", "chunk"
+ => "Symantec_Service_Group" },
},
"source" => {
"" => "10.148.0.0 255.254.0.0",
"net_mask" => {
"" => "255.254.0.0",
"address" => { "" => "255.254.0.0", "dotted_quad" => "255.254.
+0.0" },
},
"network_address" => {
"" => "10.148.0.0",
"address" => { "" => "10.148.0.0", "dotted_quad" => "10.148.0.
+0" },
},
},
},
}
CAN'T MATCH: access-list V422_IN extended permit object-group Symantec
+_Service_Group any any (Source & Destination any)
CAN'T MATCH: access-list V423_IN extended permit tcp any any range 137
+ 139 (with a range of TCP ports)
CAN'T MATCH: access-list V424_IN extended permit tcp any any eq 445 (w
+ith a single service port)
Good luck!
| [reply] [d/l] [select] |
One of the problems with taking an existing BNF and turning it into a regular expression is that the most BNFs assume an implicite tokenization. Pattern matching doesn't tokenize. For instance, "access_listfoopermitiphost123host456eq789" is matched by your pattern.
For the Cisco ACL, one might get away with requiring whitespace between each token, but in general, that will not work.
| [reply] |
Amazing! So if I could write down my grammar accurately, bulk of the job is done. But for some reason it is not matching anything. Does it have to match all the tokens in order to have a successful match? I am suspicious of the grammar that I provided.
Any way here is the data you can test with (same as the one successfully used by CountZero for his test case):
access-list V420_IN extended permit object-group Symantec_Service_Grou
+p 10.148.0.0 255.254.0.0 host 10.149.16.40
access-list V420_IN extended permit object-group Symantec_Service_Grou
+p any any
access-list V420_IN extended permit tcp any any range 137 139
access-list V420_IN extended permit tcp any any eq 445
I note that I didn't mention the possibility of 'any' for source & destination in my BNF grammar. It should be a simple addition though. But I wonder why even the first rule cannot be parsed. Here is the output with debug on
===================> Trying <grammar> from position 0
access-list V420_IN |...Trying <acl>
| \FAIL <acl>
\FAIL <grammar>
===================> Trying <grammar> from position 1
ccess-list V420_IN e |...Trying <acl>
| \FAIL <acl>
\FAIL <grammar>
===================> Trying <grammar> from position 2
cess-list V420_IN ex |...Trying <acl>
| \FAIL <acl>
... and so on till the last line. | [reply] [d/l] [select] |
Something like (untested):
my $pat = qr {
(?(DEFINE)
(?<action> (?:\s*\b(?:permit|deny)\b))
(?<protocol> (?:\s*\b(?:tcp|upd|ip|object-group(?&object_group_
+name)\b))
(?<object_group_name> (?:please define))
(?<source>) (?:\s*\b(?:object-group (?&object_group_name)|host
+ (?&host_address)|(?&network_address) (?&net_mask))\b))
(?<host_address> (?:please define))
(?<network_address> (?:please define))
(?<net_mask> (?:please define))
(?<destination>) (?&source))
(?<port> (?:\s*\b(?:port (?&port_number)|range (?&low_port) (?&
+high_port)|)\b))
(?<port_number> (?:please define))
(?<low_port> (?:please define))
(?<high_port> (?:please define))
)
acl ((?&action)) ((?&protocol)) ((?&source)) ((?&destination)) ((?&por
+t))
}x
| [reply] [d/l] |