JohnRuf has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse a very general list of parameters. The parameters may be real numbers, quoted text strings, or bracketed sets of such parameters, in any order. Parameters are comma and optional whitespace separated. Balanced extract_multiple looked like what I wanted, and it mostly works, but fails when it gets a real number field or empty field. It seems that the regexp match is not working as I expected and causing it to think its done. Here's the code and results:
print "ps before=$param_string\n"; my @ef = (extract_multiple($param_string, [ sub { extract_delimited($_[0]) }, sub { extract_bracketed($_[0]) }, sub { extract_variable($_[0]) }, qr/([^,]+)(.*)/, ], undef,1)); print "extracted=@ef\n\n"; Results: ps before=0.0, 0.0, 0.0, @inches, 0.0, 0.0, [0.0,0.0,'PI$pxtr19i'] extracted=0.0 ps before=0.0,0.0,0.0,@inches,0.0,0.0,[0.0,0.0,'PI$pxtr19i'] extracted=0.0 ps before= "TERMINAL_DRILL_SIZE", "", , @scale , , [0.034, 0.0] extracted="TERMINAL_DRILL_SIZE" "" If I remove the regexp match like this: my @ef = (extract_multiple($param_string, [ sub { extract_delimited($_[0]) }, sub { extract_bracketed($_[0]) }, sub { extract_variable($_[0]) }, #qr/([^,]+)(.*)/, ], undef,1)); then the results are: ps before=0.0, 0.0, 0.0, @inches, 0.0, 0.0, [0.0,0.0,'PI$pxtr19i'] extracted=@inches [0.0,0.0,'PI$pxtr19i'] ps before=0.0,0.0,0.0,@inches,0.0,0.0,[0.0,0.0,'PI$pxtr19i'] extracted=@inches [0.0,0.0,'PI$pxtr19i'] ps before= "TERMINAL_DRILL_SIZE", "", , @scale , , [0.034, 0.0] extracted="TERMINAL_DRILL_SIZE" "" @scale [0.034, 0.0]
I've patterned this after the example in the docs of parsing CSV text. Any clues why the regexp portion does not seem to do what the docs say?

Replies are listed 'Best First'.
Re: Text::Balanced extract_multiple does not return all values
by JohnRuf (Initiate) on Aug 29, 2003 at 22:23 UTC
    I'm still looking for wisdom on why the reg expr match terminates extract_multiple. But I have created a workaround for my initial goal of parsing a generalized list of parameters. I offer it here for your amusement:
    my @ef = (extract_multiple($param_string, [ sub { extract_delimited($_[0]) }, sub { extract_bracketed($_[0]) }, sub { extract_variable($_[0]) }, ], undef,0)); my ($extracted, @fields, $field, @subfields); foreach $extracted (@ef) { if ($extracted ne ",") { if ($extracted =~ /^["'`\[\(\$\@\%]/ ) { push(@fields, $extracted); } else { # If this is not the first field, remove any leading # spaces and a comma. If it was the very first field, # then the first comma is significant. if (@fields) { ($extracted) = ($extracted =~ /\s*,?(.*)/); } @subfields = split(/,/, $extracted); # Split on comma foreach $field (@subfields) { ($field) = ($field =~ /([^\s]+)/); # Keep non-space push(@fields, $field); } } } }
    Note the use of extract_multiple to first separate out all the delimited (quoted, etc), bracketed, and variable-like fields. Also, note that I set the last parameter to 0 so that it will return all the unmatched substrings. These substrings need to be further processed to split them up.

    At this point I loop thru the extracted fields, pushing them directly on the final @fields list if they were "balanced" text. If they were the unmatched substrings then we will split them on commas, and ignore any whitespace.

    To eliminate an extra null param at the beginning of an unmatched substring, I first eat a leading comma (and optional space), but only if this is not the very first parameter in the whole list (you need that one in case there really was a leading null param).