mifflin has asked for the wisdom of the Perl Monks concerning the following question:

I need to parse command line arguments coming in from a ksh script.
Arguments come in with the following format...
key=value
The key may not have spaces or an equal sign in it.
The value may have any characters in it
An example would be something like...
vchtyp=I jrnltyp=D lexwhere="'in('DIS', 'DIM', 'DIR')'" title="'this i +s a title'"
Note that the lexwhere and title parameters have spaces in them.
I need the results to be put into a hash like...
$hash{vchtyp} = 'I' $hash{jrnltyp = 'D'; $hash{lexwhere} = "'in('DIS', 'DIM', 'DIR')'"; $hash{title} = "'this is a test'";
There must be an easier way to parse this than what I've come up with.
Here is my mess.
sub parseArgs { my (@args) = @_; my @chars = split(// , join(' ', @args)); my %params = (); my $label = ''; my $value = ''; my $inLabel = 0; my $inValue = 0; my $doubleQuotes = 0; my $singleQuotes = 0; my $isSpace = 0; my $isEquals = 0; my $isDoubleQuote = 0; my $isSingleQuote = 0; for my $char (@chars) { $isSpace = $char =~ /\s/; $isEquals = $char eq '='; $isDoubleQuote = $char eq '"'; $isSingleQuote = $char eq "'"; if ($inLabel) { if ($isEquals) { $inLabel = 0; $inValue = 1; } else { $label .= $char; } } elsif ($inValue) { $doubleQuotes++ if $isDoubleQuote; $singleQuotes++ if $isSingleQuote; if ($isSpace) { if (($singleQuotes % 2) == 0 && ($doubleQuotes % 2) == 0) +{ $inValue = 0; $params{$label} = $value; $label = ''; $value = ''; } else { $value .= $char; } } else { $value .= $char; } } elsif (!$isSpace) { $inLabel = 1; $label .= $char; } } if ($inValue && $label && $value) { $params{$label} = $value; } return %params; }
Does anyone have any suggestions?

Edit by thelenm: changed pre tags to code tags

Replies are listed 'Best First'.
Re: parsing arguments
by BUU (Prior) on Oct 07, 2003 at 22:56 UTC
    Perhaps I'm totally off base, but if your just looking for a simple hash $hash{$key}=$value; for each option, this should be fairly easy to do.

    Since all of the args are automatically split by the shell, the hardest part is out of your way. All should have to do is this:
    my %hash; for(@ARGV) { my($key,$val) = split/=/,$_,2; $hash{$key}=$val; } use Data::Dumper; print Dumper(\%hash); __END__ Output: $VAR1 = { 'vchtyp' => 'I', 'jrnltyp' => 'D', 'title' => '\'this is a title\'', 'lexwhere' => '\'in(\'DIS\', \'DIM\', \'DIR\')\'' }; (Data::Dumper is inserting those extraneous backslashes, ignore them)
    Now if you wanted to parse some values into more complicated data structures then just scalars, well, it gets slightly more complicated. For example, the lexwhere key. You might be able to get away with something like:
    my @vals = $hash{lexwhere}=~/'([^'])'/;
    But thats really fragile and requires having strict controls over what gets passed to your script. (in fact, it's so fragile it doesn't even work on the data as presented, you'll need to munge off the beginning and ending single quotes)
      Your right!
      I looks like if you quote all the values correctly ARGV seems to get it right.
      I didn't know perl was so smart :-)
      thanks
        It's actually the shell that is so smart. Pretty much any language (at least, any with some form of ARGV-like construct) is going to parse it correctly. You see, the shell doesn't pass your program the character string from the command line, rather the shell passes your program an array of parameter strings.

        This is why if you look at system(...) or exec(...), you see that they take a list as their argument (or, again, whatever parallel construct in any other language). You also see that if you only pass one item in that list, and it contains characters which the shell will know how to handle, that perl actually invokes the shell to handle converting that string into the @ARGV of the sub-process.


        ------------
        :Wq
        Not an editor command: Wq
      I'm not sure why this is true, but on my system, Sun Solaris using ksh, I must also escape the spaces. If I don't @ARGV gets broken up along those spaces too. This is probably because I am executing my perl script from a ksh script, not from the command line. Somehow running my perl script in ksh from a ksh script changes things?
        This is because the parsing of your command-line string is done by the shell, not by perl. Different shells have slightly different rules for parsing the command line into argv. (More correctly, different shells have slightly different rules for how they expect the user to represent argv as a string.) It's just convention that most shells use very similar rules for this. (Typically, separate parameters by spaces which are not enclosed in apostrophes or quotes, and which are not preceeded by a single backslash.)

        Try this

        perl -MData::Dumper -e "print Dumper(\@ARGV).qq{\n}" a b c 'a b c' a\ +b\ c "a b c" "'a b c'" *
        under different shells and you may get slightly different results. It's really interesting to see the difference under windows/DOS command.com or cmd.exe shells, because the asterisk at the end of the line is just taken as a literal asterisk, whereas under unix shells, the asterisk will, instead, be replaced by the list of all files and directories matching the wildcard. In DOS/windows, programs that might operate on many files interpret the asterisk themselves, whereas in unix, the asterisk is transformed by the shell. For a really simple comparison, just try:
        echo *
        on both systems.

        Anyways, for really detailed information on how your various shells parse the command line into argv, you should read that shell's documentation (man page).


        ------------
        :Wq
        Not an editor command: Wq
Re: parsing arguments
by davido (Cardinal) on Oct 07, 2003 at 22:38 UTC
    Your life may be easier with the core module, Getopt::Long.

    It's well documented, and saves you from the pitfalls of trying to parse a complex command line yourself.

    In particular read the section in Getopt::Long's documentation about "Options with hash values". And the "Summary of Option Specifications".

    UPDATE: Since Getopt::Long prefers command line arguments to look like "--option=value", if it is deemed that using Getopt::Long is enough of an advantage to be worth using, you could consider the additional step of setting up a BEGIN{} block that preprocesses the command line arguments to make them look like options so that Getopts::Long can still conveniently put them into a hash for you.


    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
      I hear ya but this argument style I have to live with.
      It is the argument style that is used in Oracle's old 1.1 version of is report writer.
      I'm trying to port our existing ksh scripts to call instead the new rwservlet that needs the parameters in an http GET or POST form parameters.
      For the period of the pilot, however long that will be, I will need to have the same calls execute the old and new versions of the report.
      Um, Getopt::Long handles command lines that look like "--option value" and so on. That really has very little do with what he's attempting, other then a superficial resemblance.
Re: parsing arguments
by Roger (Parson) on Oct 07, 2003 at 23:06 UTC
    (@_@) you are not using Getopt::Long yet??? I see that many people has pointed you in the direction of Getopt::Long. I will post one of the templates that I use when writing a new application in Perl.

    #!/usr/bin/perl -w # My Perl Application # $Id$ # --- # $Log$ use Getopt::Long; use Pod::Usage; # recommended to use together with Getopt use strict; use vars qw($VERSION); $VERSION = sprintf( '%d.%02d', q$Revision$ =~ /(\d+)\.(\d+)/ ); # Parse command line arguments and assign corresponding variables # Arguments can be put into hash, array of scalar GetOptions ( 'i|in=s' => \( my @file_in = () ), 'o|out=s' => \( my $file_out = undef ), 'define=s' => \( my %defines = () ), 'verbose' => \( my $verbose = 0 ), 'version' => sub { die "$VERSION" }, 'quiet' => sub { $verbose = 0 } ); unless ( $#file_in > 0 && defined $file_out ) { pod2usage( -exitval => 1, -output => \*STDERR ); } # Beginning of application code .... __END__ =pod =head1 NAME application.pl =head1 SYNOPSIS application.pl [options] =head1 ARGUMENTS Arguments to this script are mandatory. =over 4 =item B<-i|--in [file]> Specify an input data file for report extraction. =item B<-o|--out [file]> Specify the output file name for the report. =back =cut
    In the above example, you can pass in multiple values of -i option with -i f1 -i f2 -i f3, etc.

    You can also pass in, say, -o "o1,o2,o3" and then do

    @output_files = split /,/, $file_out;


      Yes, this looks nice but I don't have the option of changing the original ksh scripts that would pass the arguments into the new perl program. Maybe I didn't understand your example enough but I still don't see how I can use it. Thanks though. It was a great example in how to use the Getopt::Long module :)
        Ok, you can use a one liner to transform the arguments into key=value pairs in a hash table.
        #!/usr/bin/perl -w use strict; use Data::Dumper; my %arg = map{my($key,$val)=split/=/;$key=>$val}@ARGV; print Dumper(\%arg);
        And when running this with the command-line parameters in your example, I got the following result:
        $VAR1 = { 'lexwhere' => '\'in(\'DIS\', \'DIM\', \'DIR\')\'', 'title' => '\'this is a title\'', 'vchtyp' => 'I', 'jrnltyp' => 'D' };
Re: parsing arguments
by pg (Canon) on Oct 08, 2003 at 00:45 UTC

    The solution provided in this reply does not fit your needs, but it is still worth to mention, as it works for data that is slightly more restricted than yours.

    The idea is to quickly form a xml string, and simply utilize XML::Simple or any other XML parser to parse it. In this way, you don't need to duplicate what others already did for you.

    use strict; use warnings; use XML::Simple; use Data::Dumper; my $str = "lexwhere=\"'in('DIS', 'DIM', 'DIR')'\" title=\"'this is a t +itle'\""; my $str_xml = "<str " . $str . " />"; print $str_xml, "\n"; my $ref = XMLin($str_xml); print Dumper($ref);
      I was going to suggest Config::IniFiles myself...
      use Config::IniFiles; my $config = Config::IniFiles->new(file=>'asdf.ini'); my $someParam = $config->val('section','param');
      where asdf.config would look like
      [section] param=value


      ----
      Zak
      undef$/;$mmm="J\nutsu\nutss\nuts\nutst\nuts A\nutsn\nutso\nutst\nutsh\ +nutse\nutsr\nuts P\nutse\nutsr\nutsl\nuts H\nutsa\nutsc\nutsk\nutse\n +utsr\nuts";open($DOH,"<",\$mmm);$_=$forbbiden=<$DOH>;s/\nuts//g;print +;

        I'm not too familiar with Config::Inifiles, but from your example, it doesn't seem to fit mifflin's needs, since (s)he wants to parse command line arguments, and Config::Inifiles works with, *tada* Inifiles :-)

        Since the Getopt::Long was already mentioned, I'd like to add "AppConfig". I have been using that myself recently and must say it works nice. It parses command line arguments, aswell as configuration files.

        --
        B10m

        Edit: English can be hard sometimes :-(
Re: parsing arguments
by ChrisR (Hermit) on Oct 08, 2003 at 19:49 UTC
    Just to add my two cents worth:
    my %hash; foreach (@ARGV) { $_ =~ m/^(.*?)=(.*)$/; $hash{$1} = $2; }
    Well maybe it was only worth 1 cent. Here's another 1 cents worth:
    my %hash = map{m/^(.*?)=(.*)$/;$1=>$2} @ARGV;