IamtheGorf has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all, I'm trying to parse a config file for a system that is a braced config file. But the config has nested braces. For instance:
bob { ed { larry { rule5 { option { disable-server-response-inspection no; } tag [ some_tag ]; from [ prod-L3 ]; to [ corp-L3 ]; source [ any ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ any ]; hip-profiles [ any ]; log-start no; log-end yes; negate-source no; negate-destination no; action allow; log-setting orion_log; } rule6 { option { disable-server-response-inspection no; } tag [ some_tag ]; from [ prod-L3 ]; to [ corp-L3 ]; source [ any ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ any ]; hip-profiles [ any ]; log-start no; log-end yes; negate-source no; negate-destination no; action allow; log-setting orion_log; } } } }

I can't figure out how to get that test to display correctly. I wrapped it in pre but the links are text that was wrapped in brackets. ANyway...

It's a lot of nesting. I have 1000's of these rules to parse plus other subsections to parse through. I started with Config::Scoped hoping that it would be able to help me but it seems to struggle with nested blocks:

GetOptions ( "pancfg=s" => \$pan_cfg, "outfile=s" => \$outfile ) or die("Error in command line arguments. Arguments:\n\t--pancfg + = Pan config file to read.\n\t--outfile = Output file for parsed dat +a."); #Open the config file and attempt to parse it. $cs = Config::Scoped->new( file => $pan_cfg, warnings => $warnings, ); $cfg_hash = $cs->parse;

That code block errors with:

Invalid decl item: Was expecting parameter or macro or comment or warning but found "mgt-config { " instead at

that "mgt-config { " is one of the subsections that occurs in the config file. I'm sorta struggling to find a decent way to do this. Can anyone offer any advice or better solutions?

Replies are listed 'Best First'.
Re: Parsing a config file with braces and nested braces
by BrowserUk (Patriarch) on Jan 07, 2015 at 02:18 UTC

    If you're not pathologically adverse to a non-CPAN solution:

    #! perl -sl use strict; use Data::Dump qw[ pp ]; sub parseConfig { my( $source, %config ) = shift; $source =~ s<(\S+)\s*(\{)\s*(.+)\}|(\S+)(\s+)([^;]+);> < my( $key, $f, $rest ) = ( $1 || $4, $2 || $5, $3 || $6 ); $config{ $key } = $f eq "{" ? parseConfig( $rest ) : $rest; >seg; return \%config; } my $config = parseConfig( do{ local $/; <DATA> } ); pp $config; __DATA__ ## data per your OP

    Produces:

    C:\test>junk70 { bob => { ed => { larry => { rule5 => { action => "allow", application => "<a href=\"?nod +e=%20any%20\"> any </a>", category => "<a href=\"?nod +e=%20any%20\"> any </a>", destination => "<a href=\"?nod +e=%20any%20\"> any </a>", from => "<a href=\"?nod +e=%20prod-L3%20\"> prod-L3 </a>", "hip-profiles" => "<a href=\"?nod +e=%20any%20\"> any </a>", "log-end" => "yes", "log-setting" => "orion_log", "log-start" => "no", "negate-destination" => "no", "negate-source" => "no", option => { "disable-serv +er-response-inspection" => "no" }, service => "<a href=\"?nod +e=%20any%20\"> any </a>", source => "<a href=\"?nod +e=%20any%20\"> any </a>", "source-user" => "<a href=\"?nod +e=%20any%20\"> any </a>", tag => "<a href=\"?nod +e=%20some_tag%20\"> some_tag </a>", to => "<a href=\"?nod +e=%20corp-L3%20\"> corp-L3 </a>", }, }, }, }, }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This seems to work pretty well, except that it seems to get stuck when it parses into the next rule. I realized that I could have expanded my data example a bit. The rules I am trying to parse are sequential in the larry{} section. While your code works awesome, it seems to trip up just a little when encountering the next rule. Output example with my updated example above...
      { bob => { ed => { larry => { rule5 => { "action" => "allow", "application" => "[ any ]", "category" => "[ any ]", "destination" => "[ any ]", "from" => "[ prod-L3 ]", "hip-profiles" => "[ any ]", "log-end" => "yes", "log-setting" => "orion_log", "log-start" => "no", "negate-destination" => "no", "negate-source" => "no", "option" => { "action" + => "allow", "application" + => "[ any ]", "category" + => "[ any ]", "destination" + => "[ any ]", "disable-server-respo +nse-inspection" => "no", "from" + => "[ prod-L3 ]", "hip-profiles" + => "[ any ]", "log-end" + => "yes", "log-setting" + => "orion_log", "log-start" + => "no", "negate-destination" + => "no", "negate-source" + => "no", "service" + => "[ any ]", "source" + => "[ any ]", "source-user" + => "[ any ]", "to" + => "[ corp-L3 ]", "}" + => "rule6 { \n opt ion { \n disable-server-response-inspection no", }, "service" => "[ any ]", "source" => "[ any ]", "source-user" => "[ any ]", "tag" => "[ some_tag ]", "to" => "[ corp-L3 ]", }, }, }, }, }
        except that it seems to get stuck when it parses into the next rule.

        Yes. I see the problem. But I do not see an immediate solution and I'm rather involved in something else at the moment. (Sorry!)

        If you have other avenues of attack, pursue them rather than wait for this; as it might be a while.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Parsing a config file with braces and nested braces
by RonW (Parson) on Jan 07, 2015 at 01:07 UTC

    1000s of rules in 1 (or a few) huge files, or many small files?

    If many small to medium files, you could try Text::Balanced to help you extract brace-delimited blocks of text.

    If the files are too big read in to memory, you will need to parse them a line at a time. When you encounter a {, push the preceding section name in to an @array, add it to a %hash (as a key) and create another hash to hold the following entries as you parse them. When you encounter a }, pop the most recent name off the @array. As you process key/value pairs, add them to the %hash for the current (sub)section. When done, you will have a hash of hashes of hashes of ..., nested like the config data.

Re: Parsing a config file with braces and nested braces ( like json/yaml)
by Anonymous Monk on Jan 07, 2015 at 00:41 UTC
Re: Parsing a config file with braces and nested braces
by LanX (Saint) on Jan 07, 2015 at 00:43 UTC
    > I can't figure out how to get that test to display correctly. 

    Use <code> tags.

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

Re: Parsing a config file with braces and nested braces
by choroba (Cardinal) on Jan 08, 2015 at 15:34 UTC
    I tried to implement a Marpa::R2 solution:
    #!/usr/bin/perl use warnings; use strict; use Marpa::R2; use Data::Dumper; my $input = do { local $/; <DATA> }; my $grammar = << '__GRAMMAR__'; lexeme default = latm => 1 :start ::= List :default ::= action => ::first List ::= Hash+ action => list Hash ::= String '{' Pairs '}' action => hash Pairs ::= Pair+ action => list Pair ::= String Value ';' action => pair | Hash Value ::= Simple | Bracketed Bracketed ::= '[' String ']' action => second Simple ::= String String ~ [-a-zA-Z_0-9]+ whitespace ~ [\s] + :discard ~ whitespace __GRAMMAR__ sub hash { +{ $_[1] => $_[3] } } sub pair { +{ $_[1] => $_[2] } } sub second { [ @_[ 2 .. $#_-1 ] ] } sub list { shift; \@_ } my $parser = 'Marpa::R2::Scanless::G'->new({ source => \$grammar }); print Dumper $parser->parse(\$input, 'main', { trace_terminals => 1 }) +; __DATA__ bob { ed { larry { ...

    Does the output satisfy you?

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Parsing a config file with braces and nested braces
by Anonymous Monk on Jan 08, 2015 at 02:16 UTC

    If there's an existing module that can handle your configuration file format, then it is always preferable to use that over DIY. But since I haven't written a grammar in a while I thought it would be a nice exercise. (That also means I'm a bit rusty and it may not be coded optimally.) I used Regexp::Grammars, but there are a few other similar modules out there, a classic being Parse::RecDescent.

    The $MATCH lines control the data that is returned and reduce it from what Regexp::Grammars normally produces (to see what that is, remove the $MATCH lines). The data structure is still kind of overly complex and deeply nested, that's because I thought it might be better to initially keep the ordering of all items intact. If your config blocks don't contain duplicate keys and the order doesn't matter then of course they can be reduced down to regular hashes. I also took the liberty of allowing the config values surrounded by brackets to be interpreted as whitespace separated lists, I'm not sure if that's how your config format works or not.

    use warnings; use strict; use 5.010; # required for Regexp::Grammars my $parser = do { use Regexp::Grammars; qr{ <[Block]>* <rule: Block> <BlockName=Word> \{ <[BlockItem]>* \} (?{ $MATCH = { $MATCH{BlockName} => $MATCH{BlockItem} } }) <rule: BlockItem> (?: <Block> | <KeyValue> ) (?{ $MATCH = $MATCH{KeyValue} || $MATCH{Block} }) <rule: KeyValue> <Key=Word> (?: <Value=Word> | \[ <[Value=Word]>* % \s+ \] ) \; (?{ $MATCH = { $MATCH{Key} => $MATCH{Value} } }) <token: Word> [\w-]+ } }; my $input = do { open my $fh, '<', '1112435.txt' or die $!; local $/; <$fh> }; $input =~ $parser or die "Failed to parse input"; use Data::Dump 'pp'; say pp $/{Block};

    And finally, the output for your current example:

      Hmm, that failed for me with
      $ mversion Regexp::Grammars 1.033
      but not with Regexp-Grammars-1.038

        Ah, interesting, thanks!

        Adding the /x modifier to the regex fixes it under 1.033, apparently Regexp::Grammars started automatically adding the /x in version 1.035.

        I guess the module isn't perfect, looks like it has issues under Perl 5.18 (and even 5.10?)...

Re: Parsing a config file with braces and nested braces
by Anonymous Monk on Jan 08, 2015 at 01:04 UTC
    Speaking of Re: Parsing a config file with braces and nested braces ( like json/yaml) and Re: Parsing a config file with braces and nested braces this works
    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; use 5.010; #~ use v5.10.0; my $raw = ' bob { ed { larry { rule5 { option { disable-server-response-inspection no; } tag [ some_tag ]; from [ prod-L3 ]; to [ corp-L3 ]; source [ any ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ any ]; hip-profiles [ any ]; log-start no; log-end yes; negate-source no; negate-destination no; action allow; log-setting orion_log; } rule6 { option { disable-server-response-inspection no; } tag [ some_tag ]; from [ prod-L3 ]; to [ corp-L3 ]; source [ any ]; destination [ any ]; source-user [ any ]; category [ any ]; application [ any ]; service [ any ]; hip-profiles [ any ]; log-start no; log-end yes; negate-source no; negate-destination no; action allow; log-setting orion_log; } } } } '; my $FROM_CONFIG = qr{ (?<OBJECT_OPEN> ^ \s* (?<NAME> \w+ ) \s* \{ \s* [\r\n]+ ) | (?<OBJECT_CLOSE> ^ \s* \} \s* [\r\n]+ ) | (?<KEYVAL> ^ \s* (?<KEY> [\w\-]+ ) \s+ (?<VAL> [^\r\n\{;]+ ) ; \s* [\r\n]+ ) | (?<UHOH> . ) }xms; my @stack = {}; while( $raw =~ m{$FROM_CONFIG}g ){ ## dd( \%+ ); ## push @stack, { %+ }; my $freeze = { %+ }; if( $freeze->{OBJECT_OPEN} ){ my $new = {}; $stack[-1]->{ $freeze->{NAME} } = $new; push @stack, $new; }elsif( $freeze->{OBJECT_CLOSE} ){ pop @stack; }elsif( $freeze->{KEYVAL} ){ $stack[-1]->{ $freeze->{KEY} } = $freeze->{VAL}; } } dd( \@stack ); __END__