bengo has asked for the wisdom of the Perl Monks concerning the following question:

% perl use warnings; use strict; use Data::Dumper; sub f { my @tmp = "list $_[0]\n" =~ /^list\s+(\w+)(?:,(\w+))*$/m; print(Dumper(\@tmp)); } f("foo"); f("foo,bar"); # Ctrl-D $VAR1 = [ 'foo', undef ]; $VAR1 = [ 'foo', 'bar' ];

Dear Enlightened Monks,

I was somewhat surprised to find this "extra" undef returned here, when I expected/wanted only "foo".
Can you explain why this happens? (preferably without reference to use re 'debug' ;)
Is there any way to tweak my regex to avoid it?

I'm assuming that this is a feature, not a bugTM, and there isn't an easy way to make Perl itself less surprising in the general case...?

Thanks!

Replies are listed 'Best First'.
Re: Surprising capture of undef with zero repetitions
by Eily (Monsignor) on Jun 22, 2016 at 17:27 UTC

    The feature is that perl returns one element for each capture group to preserve numbering (so $tmp[0] is the same as $1, $tmp[1] is $2 etc ...).
    For example with my ($_a, $_b, $_c) = "ac" =~ /(a)?(b)?(c)?/;, you'll have $_c = 'c' and 'b' undef.
    Removing the extra undef at the end of the list, or even in the middle for some special cases could have been possible, but it is actually fairly easy to do it yourself: @tmp = grep defined, /(a)?(b)?(c)?/; so the decision has been to provide the result with all the available information (undef matches at the end indicate that some groupes have not matched, this may be useful information) and let the user decide what to do with it, rather than choose on the user's behalf what information is useful or not.

    Edit: so in your case (without the implicit match on $_), the solution would be: my @tmp = grep defined, "list $_[0]\n" =~ /^list\s+(\w+)(?:,(\w+))*$/m;

Re: Surprising capture of undef with zero repetitions
by Marshall (Canon) on Jun 22, 2016 at 17:36 UTC
    Update: I should have thought a bit more before pressing "submit".. "list" at the beginning of the line is part of the problem statement. I would be thinking of removing that, then doing the match global on what is left as shown below.  /^list\s+(\w+)/g won't work, only gets the foo, not also the bar. I think that there is some way to get match global to back up and keep capturing (\w+)'s, in a single regex, but at the moment, I don't know how to do that. Sorry.

    Update 2: now another solution occurred to me, that appears to work:
    (updated code to add: f("foo,bar,bar2,bar3"); - no regex change, just another data example)

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; sub f { my $in = shift; my (@tmp) = "list $in" =~ /(?:^list\s+)?(\w+)/g; print(Dumper(\@tmp)); } f("foo"); f("foo,bar"); f("foo,bar,bar2,bar3"); __END__ $VAR1 = [ 'foo' ]; $VAR1 = [ 'foo', 'bar' ]; $VAR1 = [ 'foo', 'bar', 'bar2', 'bar3' ];
    One way is to use match global:
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; sub f { my $in = shift; my (@tmp) = $in =~ /(\w+)/g; print(Dumper(\@tmp)); } f("foo"); f("foo,bar"); __END__ $VAR1 = [ 'foo' ]; $VAR1 = [ 'foo', 'bar' ];
Re: Surprising capture of undef with zero repetitions
by Marshall (Canon) on Jun 22, 2016 at 20:49 UTC
    As another comment re: your comment "(preferably without reference to use re 'debug' ;)".

    Here is another way to figure out what a regex means. Yes, I understand that in the context of your question, this wouldn't have been helpful. Eily nailed it. But this is often better than "use re 'debug'". This tip may be useful later.

    #!/usr/bin/perl use warnings; use strict; use YAPE::Regex::Explain; my $REx = '/^list\s+(\w+)(?:,(\w+))*$/m'; my $exp = YAPE::Regex::Explain->new($REx)->explain; print $exp; __END__ The regular expression: (?-imsx:/^list\s+(\w+)(?:,(\w+))*$/m) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- list 'list' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- , ',' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- /m '/m' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- Process completed successfully

      It's important to note that YAPE::Regex::Explain does not support regex constructs introduced after Perl version 5.6, and in particular the extensions of version 5.10.


      Give a man a fish:  <%-{-{-{-<

Re: Surprising capture of undef with zero repetitions
by Anonymous Monk on Jun 22, 2016 at 19:39 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1166285 use strict; use warnings; use Data::Dumper; sub f { my @tmp = "list $_[0]\n" =~ /\G(?:^list\s+|,)(\w+)(?=(?:,\w+)*$)/m +g; print(Dumper(\@tmp)); } f("foo"); f("foo,bar"); f("foo,bar "); # should fail on this f("foo, bar"); # should fail on this

    produces

    $VAR1 = [ 'foo' ]; $VAR1 = [ 'foo', 'bar' ]; $VAR1 = []; $VAR1 = [];