in reply to Parsing issue

Use a zero-width positive look ahed assertion to match the next parameter or the end of the line, e.g.
#! /usr/bin/perl my $string = q/allow:test1,"@test 2 " deny:test3,test4 password:"123 + 456"/; while ($string =~ s/(\w+):(.*?)($|(?=\w+:))//) { print "Argument: $1\n"; my @params = split /,/,$2; print " Param: $_\n" foreach (@params); }
Note that this simple example splits the parameters on a comma symbol, so will break on something like
test:"This, contains, commas",foo,bar
But it should get you started.

JJ

Replies are listed 'Best First'.
Re: Re: Parsing issue
by zigdon (Deacon) on Oct 08, 2002 at 12:26 UTC
    Wouldn't this break also on something like this?
    $string = q/allow:"bad param:doh!" deny:test2/;

    Not sure how I'd go about parsing this, but perhaps you could preprocess the string, replacing all the quoted text with placeholders, then splitting on spaces?

    -- Dan

      Try this:
      #! /usr/bin/perl my $string = q/allow:test1,"@test, 2 " deny:test3,test4 password:"123 + 456doh:"/; while ($string =~ s/(\w+):((\w+|"[\w ,:@]+")(,\s*(\w+|"[\w ,:@]+"))*)\ +s*($|(?=\w+:))//) { print "Argument: $1\n"; my $paramlist = $2; while ($paramlist =~ s/(\w+|"[\w ,:@]+")\s*,*\s*//) { print " Param: $1\n"; } }
      However the regexp is starting to get a bit complicated - Text::ParseWords looks like a neater solution.

      JJ

Re: Re: Parsing issue
by hotshot (Prior) on Oct 08, 2002 at 14:06 UTC
    thanks for your answer, it's good enough for me since no spaces are allowed in argument name and no commas in quoted strings. but I have a little question since I never used regexps with lookahead assertions, what is the '$|' symbol in the regexp (just before the assertion)?

    Thanks again

    Hotshot
      The $ symbol means the end of the line, and the | symbol means 'or'.

      So this part of the regexp

      ($|(?=\w+:))
      translates to "match if either the end of line has been reached, OR if the next part (lookahead) matches one or more alphanumeric characters followed by a colon"

      Without checking for the end of line, the last parameter would always be missed out (it would only match a parameter if it was followed by another one).

      JJ