proxie has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks. I work at a company where I don't normally do scripting. I tend to use Perl for like 3 days for some one off scripting need, and then promptly forget everything I learned. I was hoping someone could do this reg exp faster. Say I have a an automatically dumped command line like:
execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args + +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 ' +"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args te +st -seed 1 -tex_args- -opt 1 -tag 2
Since it's difficult to parse what's really going on when it's all a single line, I instead would like to create a script that would take this command line as input and effectively print out something like the below in a kind of tree format:
execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1 -tag
Can someone help me get started on the most efficient reg exps to do this? There are certain anchors I can see in which to key off of in terms of knowing when to tab, but it's not consistent. For instance, can't always use the "-" reliably to know when to tab over.

Replies are listed 'Best First'.
Re: Help with parsing command line to make more readable
by tybalt89 (Monsignor) on Aug 19, 2018 at 18:11 UTC

    Close enough?

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1220646 use strict; use warnings; $_ = q(execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -s +ub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' r +uck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_ +args test -seed 1 -tex_args- -opt 1 -tag 2); print "$_\n\n"; my $answer = ''; $answer .= s/^-/\t-/gmr . "\n" for /\S.*?(?= +-)/g; $answer =~ s/^(\s*)(-\w+)\n\K((?:.*\n)*?)(?=\1\2-\n)/ $3 =~ s#^#\t#gmr + /gem; print $answer;

    Outputs:

    execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args + +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 ' +"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args te +st -seed 1 -tex_args- -opt 1 -tag 2 execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1
      $answer .= s/^-/\t-/gmr . "\n" for /\S.*?(?= +-)/g;

      This leaves out the  -tag 2 at the very end of the c.l. string. Changing the look-ahead to  (?=(?: +-|$)) fixes this:

      c:\@Work\Perl\monks>perl -wMstrict -le "$_ = q(execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ - +sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '\"' + ruck=1 '\"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -s +ub_args test -seed 1 -tex_args- -opt 1 -tag 2); ;; print qq{$_ \n}; ;; my $answer = ''; $answer .= s/^-/\t-/gmr . \"\n\" for /\S.*?(?=(?: +-|$))/g; $answer =~ s{^(\s*)(-\w+)\n\K((?:.*\n)*?)(?=\1\2-\n)} { $3 =~ s{^}{\t}gmr }gem; ;; print $answer; " execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args + +debug_dir=./ -constraint par ity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_arg +s -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1 -t +ag 2 execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1 -tag 2
      I still don't like that the  -cd is separated from its preceding  -sub_args argument. Have to think about that...


      Give a man a fish:  <%-{-{-{-<

      Ok, this groups  -sub_args -cd together:

      c:\@Work\Perl\monks>perl -wMstrict -le "$_ = q(execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ - +sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '\"' + ruck=1 '\"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -s +ub_args test -seed 1 -tex_args- -opt 1 -tag 2); ;; print qq{>$_< \n}; ;; # my $sw = qr{ (?: -sub_args \s+ \S+ | \S) .*? }xms; my $chunk = qr{ (?: -sub_args \s+ \S+ | \S) .*? (?= \s+ - | \Z) }xms; my $answer; $answer .= s/^-/\t-/gmr . qq{\n} for /$chunk/g; $answer =~ s{^(\s*)(-\w+)\n\K((?:.*\n)*?)(?=\1\2-\n)} { $3 =~ s{^}{\t}gmr }gem; ;; print qq{>>>$answer<<< \n}; " >execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_arg +s +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 +'"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args t +est -seed 1 -tex_args- -opt 1 -tag 2< >>>execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1 -tag 2 <<<

      Update: The
          my $sw = qr{ (?: -sub_args \s+ \S+ | \S) .*? }xms;
      statement above is superfluous.


      Give a man a fish:  <%-{-{-{-<

Re: Help with parsing command line to make more readable
by LanX (Saint) on Aug 19, 2018 at 19:05 UTC
    that's a complicated grammar

    > Can someone help me get started

    you can regex it to become Perl code of nested calls and eval it

    like

    -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -tex_args-

    becoming

    minus::tex_args minus::sub_args plus::debug_dir '=./' , minus::sub_args plus::debug_dir '=./' , minus::tex_args '-'

    now you can populate the packages minus and plus with appropriate functions dealing with the input.

    The subs are called from right to left.

    start with all of the like sub text_args { return ["-text_args", @_] } and then you'll have a start to improve your grammar.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Help with parsing command line to make more readable (updated)
by AnomalousMonk (Archbishop) on Aug 20, 2018 at 06:52 UTC

    Here's another step along the way. Based on tybalt89's beginning. Handles arbritrary nesting levels (I think). (Update: Does not handle arbitrary nesting levels of identical switches. E.g., change  -fizz_boff ... -fizz_boff- to  -tex_args ... -tex_args- in the example code below. (sigh)) Lots of assumptions about syntax of the c.l. This needs more testing, and I'm sure proxie will be back with lots of updated and expanded specifications.

    Output:
    c:\@Work\Perl\monks\proxie>perl prettyprint_cl_1.pl >execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -fizz_bo +ff -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args +'"' ruck=1 '"' -bar -xyzzy -bar- -constraint dual_en -sub_args +-cd -sub_args 2596.slow -fizz_boff- -sub_args test -seed 1 -tex_args- + -opt 1 -foo -x y -zz 42 -foo- -tag 2 -last 1< "execute test_number_1\n -tex\n -tex_args\n -sub_args +debug_ +dir=./\n -fizz_boff\n -s ub_args +debug_dir=./\n -constraint parity_en,random_en\n -sub_a +rgs '\"' ruck=1 '\"'\n -bar \n -xyzzy\n -bar-\n -constraint dual_en\n -sub_args -cd\n + -sub_args 2596.slow\n -f izz_boff-\n -sub_args test\n -seed 1\n -tex_args-\n -opt 1 +\n -foo\n -x y\n -zz 42\n -foo-\n -tag 2\n -last 1\n" >>>execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -fizz_boff -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -bar -xyzzy -bar- -constraint dual_en -sub_args -cd -sub_args 2596.slow -fizz_boff- -sub_args test -seed 1 -tex_args- -opt 1 -foo -x y -zz 42 -foo- -tag 2 -last 1 <<<
    Sorry about any wraparound.


    Give a man a fish:  <%-{-{-{-<

      This is good enough to get ideas, thank!
Re: Help with parsing command line to make more readable
by kcott (Archbishop) on Aug 20, 2018 at 12:19 UTC

    G'day proxie,

    Welcome to the Monastery.

    "Can someone help me get started on the most efficient reg exps to do this?"

    When Perl's string handling functions can do the job, you'll generally find that they'll be measurably, more efficient than regular expressions. Here's a solution that doesn't use regular expressions. Use Benchmark to compare the efficiency of this code against other posted solutions.

    #!/usr/bin/env perl use strict; use warnings; my $cmd = q{execute test_number_1 -tex -tex_args -sub_args +debug_dir= +./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args +'"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow +-sub_args test -seed 1 -tex_args- -opt 1 -tag 2}; my $TAB = ' ' x 4; my $depth = 0; my @tokens = split ' ', $cmd; my @parts; cmd_format(\@tokens, \@parts, $depth); print "$_\n" for @parts; sub cmd_format { my ($tokens, $parts, $depth) = @_; for (my $i = 0; $i <= $#$tokens; ++$i) { my $token = $tokens->[$i]; if (-1 == index $token, '-', 0) { my @starts = ($token); while (-1 == index $tokens[$i + 1], '-', 0) { push @starts, $tokens[++$i]; } push @$parts, $TAB x $depth . "@starts"; } elsif ($token eq '-tex') { push @$parts, $TAB x ($depth + 1) . $token; cmd_format([@{$tokens}[$i + 1 .. $#$tokens]], $parts, $dep +th + 2); last; } elsif ($token eq '-tex_args') { push @$parts, $TAB x $depth . $token; my $args = []; while ($tokens->[$i + 1] ne '-tex_args-') { push @$args, $tokens->[++$i]; } cmd_format($args, $parts, $depth + 1); push @$parts, $TAB x $depth . $tokens->[++$i]; } else { my @opts = ($token, $tokens->[++$i]); for ($i + 1 .. $#$tokens) { last unless -1 == index $tokens->[$i + 1], '-', 0; push @opts, $tokens->[++$i]; } push @$parts, $TAB x $depth . "@opts"; } } }

    Output:

    execute test_number_1 -tex -tex_args -sub_args +debug_dir=./ -sub_args +debug_dir=./ -constraint parity_en,random_en -sub_args '"' ruck=1 '"' -constraint dual_en -sub_args -cd -sub_args 2596.slow -sub_args test -seed 1 -tex_args- -opt 1 -tag 2

    — Ken

      Thanks for the suggestion!