perlpal has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I need to parse dynamic varying command line strings and create a hash of the syntax.

An example of the string is below (there are more variations to this with different syntax) :

dfpm dataset add -D -N <node-name> <data-set-name-or-id> { <volume-name-or-id> | <qtree-name-or-id> | <ossv-dir-name-or-id> | <storage-system-name-or-id> | <host-name-or-id> } ...

A brief about the syntax in the string :

[] - optional parameters

<> - mandatory parameters

{|} - or parameters

The hash i need to build from the string mentioned above is below :

$dataset_add = { 'mandatory' => [ { 'param1' => '$data-set-name-or-id', } ], 'optional' => [ { 'switch' => '-D', 'value' => '' }, { 'switch' => '-N', 'value' => '$node-name' }, { 'switch' => '', 'value' => [$volume-name-or-id,$volu +me-name-or-id,$ossv-dir-name-or-id,$storage-system-name-or-id,$host-n +ame-or-id ] }, ] }

Looking for direction to accomplish the same.

Thanks in Advance!

Replies are listed 'Best First'.
Re: Parsing command string into a hash
by ikegami (Patriarch) on Jul 16, 2009 at 14:28 UTC
    Well, you'll need a parser. The grammar is quite straightforward, so it shouldn't be too hard to write one using Parse::RecDescent. What's you question?
      I did go through Parse::RecDescent. I couldn't quite understand how to formulate the grammar.

      Any help with that would be greatly appreciated!

        I couldn't quite understand how to formulate the grammar.

        Odd, since it's very similar to what you need to parse. What did you try? At first, focus on the parsing and don't worry about your output. Add that at the end.


        There are problems with the design of your output structure.

        • Why is a non-switch parameter called "param1' in one place and "value" in another?

        • Why would a property of mandatory parameters be called "param1"?

          # Bad - Redundant 'mandatory' => [ { 'param1' => '...', } { 'param2' => '...', } ], # Bad - Hash as an array 'mandatory' => { 'param1' => '...', 'param2' => '...', }, # Ok 'mandatory' => [ { 'param' => '...', ... } { 'param' => '...', ... } ], # Ok 'mandatory' => [ '...', '...', ],
        • What if an alternation contains something other than just a <...> or just a [<...>]? For example, could you have { update | insert }? Your output format doesn't support that.

          By the way, it works out that if any of the terms of an alternation are optional, they all are.

          { ... | [...] | ... }
          is equivalent to
          [ { ... | ... | ... } ]

          That's good, because your design relies on it.

Re: Parsing command string into a hash
by jrsimmon (Hermit) on Jul 16, 2009 at 14:55 UTC
    Getopt::ExPar is a current package that appears, on the surface, to be quite powerful. I haven't used it myself, but it may be worth your time.
Re: Parsing command string into a hash
by ikegami (Patriarch) on Jul 16, 2009 at 18:04 UTC
    The following is a solution. The output format than requested since it wasn't sufficient as per my earlier comment. It may be possible to simplify the output format if certain circumstances will never occur, but you said you needed a more general solution than one that could parse the presented statement.
    #!/usr/bin/perl # make_parser.pl use strict; use warnings; use Parse::RecDescent qw( ); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; sub optimise_option { my ($option) = @_; for (;;) { return $option if @{ $option->{children} } != 1; my ($child) = @{ $option->{children} }; return $option if $child->{type} ne 'option'; $option = $child; } } sub optimise_alternation { my ($alternation) = @_; my $choices = $alternation->{choices}; my $optional = 0; for my $choice (@$choices) { next if grep { $_->{type} ne 'option' } @$choice; $optional = 1; last; } return $alternation if !$optional; for my $choice (@$choices) { next if @$choice != 1; my ($child) = @$choice; next if $child->{type} ne 'option'; @$choice = @{ $child->{children} }; } return { type => 'option', children => [ $alternation ], }; } } parse : command param(s?) /\Z/ { [ $item[1], @{$item[2]} ] } command : IDENT param : option | switch | variable | alternation | literal option : '[' param(s?) ']' { optimise_option({ type => $item[0], children => $item[2], }) } switch : DASHED variable(?) { +{ type => $item[0], name => $item[1], value => @{$item[2]} ? $item[2][0]{name} : undef +, } } variable : '<' <skip:''> IDENT '>' { +{ type => $item[0], name => $item[3], } } alternation : '{' altern_body '}' { optimise_alternation({ type => $item[0], choices => $item[2], }) } altern_body : <leftop: param(s) '|' param(s) > literal : IDENT { +{ type => $item[0], value => $item[1], } } IDENT : /[a-zA-Z][a-zA-Z0-9-]*/ DASHED : /-[a-zA-Z][a-zA-Z0-9]*/ __END_OF_GRAMMAR__ Parse::RecDescent->Precompile($grammar, 'Grammar') or die("Bad grammar\n");

    #!/usr/bin/perl # test.pl use strict; use warnings; use Data::Dumper qw( ); use Grammar qw( ); { my %keys_lkup = ( option => [qw( children )], switch => [qw( name value )], variable => [qw( name )], alternation => [qw( choices )], literal => [qw( value )], ); sub keys_by_type { my ($h) = @_; if (exists($h->{type})) { return [ qw( type ), @{ $keys_lkup{ $h->{type} } } ]; } else { return [ sort keys %$h ]; } } } { my $parser = Grammar->new(); while (<DATA>) { chomp; my $params = $parser->parse($_) or do { warn("Bad data at line $.\n"); next; }; print(">> $_\n"); print Data::Dumper ->new([ $params ], [qw( $params )]) ->Indent(1) ->Sortkeys(\&keys_by_type) ->Dump(); print("\n"); } } __DATA__ dfpm dataset add [-D] [-N <node-name>] <data-set-name-or-id> { [<volum +e-name-or-id>] | [<qtree-name-or-id>] | [<ossv-dir-name-or-id>] | [<s +torage-system-name-or-id>] | [<host-name-or-id>] } nested-square [ [ param ] ] multi-child { foo <foo> | bar <bar> } partially-optimisable { <foo> | [<bar>] | [<cat>] [<dog>] }

    $ perl make_parser.pl && perl test.pl >> dfpm dataset add [-D] [-N <node-name>] <data-set-name-or-id> { [<vo +lume-name-or-id>] | [<qtree-name-or-id>] | [<ossv-dir-name-or-id>] | +[<storage-system-name-or-id>] | [<host-name-or-id>] } $params = [ 'dfpm', { 'type' => 'literal', 'value' => 'dataset' }, { 'type' => 'literal', 'value' => 'add' }, { 'type' => 'option', 'children' => [ { 'type' => 'switch', 'name' => '-D', 'value' => undef } ] }, { 'type' => 'option', 'children' => [ { 'type' => 'switch', 'name' => '-N', 'value' => 'node-name' } ] }, { 'type' => 'variable', 'name' => 'data-set-name-or-id' }, { 'type' => 'option', 'children' => [ { 'type' => 'alternation', 'choices' => [ [ { 'type' => 'variable', 'name' => 'volume-name-or-id' } ], [ { 'type' => 'variable', 'name' => 'qtree-name-or-id' } ], [ { 'type' => 'variable', 'name' => 'ossv-dir-name-or-id' } ], [ { 'type' => 'variable', 'name' => 'storage-system-name-or-id' } ], [ { 'type' => 'variable', 'name' => 'host-name-or-id' } ] ] } ] } ]; >> nested-square [ [ param ] ] $params = [ 'nested-square', { 'type' => 'option', 'children' => [ { 'type' => 'literal', 'value' => 'param' } ] } ]; >> multi-child { foo <foo> | bar <bar> } $params = [ 'multi-child', { 'type' => 'alternation', 'choices' => [ [ { 'type' => 'literal', 'value' => 'foo' }, { 'type' => 'variable', 'name' => 'foo' } ], [ { 'type' => 'literal', 'value' => 'bar' }, { 'type' => 'variable', 'name' => 'bar' } ] ] } ]; >> partially-optimisable { <foo> | [<bar>] | [<cat>] [<dog>] } $params = [ 'partially-optimisable', { 'type' => 'option', 'children' => [ { 'type' => 'alternation', 'choices' => [ [ { 'type' => 'variable', 'name' => 'foo' } ], [ { 'type' => 'variable', 'name' => 'bar' } ], [ { 'type' => 'option', 'children' => [ { 'type' => 'variable', 'name' => 'cat' } ] }, { 'type' => 'option', 'children' => [ { 'type' => 'variable', 'name' => 'dog' } ] } ] ] } ] } ];