USP45 has asked for the wisdom of the Perl Monks concerning the following question:

This seems so simple but I must be missing something:
my $pet_list = 'dog:boston.terrier;cat:orange.tabby';
To extract boston.terrier out why wouldn't this work?
@{$pets} = $1 if $pet_list =~ /$some_variable\:(.+);??/;
Doesn't the ;?? mean match semi-colon 0 or 1 times and be non-greedy?

Replies are listed 'Best First'.
Re: Pattern Match n00b
by pc88mxer (Vicar) on Mar 14, 2008 at 20:43 UTC
    Doesn't the ;?? mean match semi-colon 0 or 1 times and be non-greedy?

    Yes, it does, but that's not what you want. You need the .+ to be non-greedy or else disallow semi-colons in your values:

    @{$pets} = $1 if $pet_list =~ /$some_variable\:(.*?)(;|\z)/; # or not allow ';' in values: @{$pets} = $1 if $pet_list =~ /$some_variable\:([^;]*)(;|\z)/;
    To make the matching more robust, you'll want to do something like this:
    @{$pets} = $1 if $pet_list =~ /(?:\A|;)\Q$some_variable\E\:([^;]*)(;|\ +z)/;
Re: Pattern Match n00b
by BrowserUk (Patriarch) on Mar 14, 2008 at 21:02 UTC
    Doesn't the ;?? mean match semi-colon 0 or 1 times and be non-greedy?

    It does, but it only makes that part of the match non-greedy. The (.+) remains greedy.

    But if you make that part non-greedy, then you will be asking to capture as little as possible following the ':', that might or might not be followed by a ';'. Which means it will capture just a single character.

    A couple of ways to approach the problem:

    1. Make capture non-greedy and have a definite terminating condition:/$some_variable\:(.+?)(?:;|$)/

      Where (?:;|$) requires a semicolon or the end-of-string, (prefering the former) to terminate the capture.

    2. Or as you know the bit you want to capture cannot contain a ';', reduce the scope of the capture using [^;]+ in place of '.' and omit the terminator.
      /$some_variable\:([^;]+)/

    Either works:

    $pet_list = 'dog:boston.terrier;cat:orange.tabby';; $pets=[]; $pet_list =~ /$_\:(.+?)(?:;|$)/ and push @{$pets},$1 for qw[cat dog]; print for @{$pets};; orange.tabby boston.terrier $pets=[]; $pet_list =~ /$_\:([^;]+)/ and push @{$pets},$1 for qw[cat dog]; print for @{$pets};; orange.tabby boston.terrier

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Pattern Match n00b
by skirnir (Monk) on Mar 14, 2008 at 20:59 UTC
    No, to make the .+ part non-greedy you'd have to place the '?' directly after the "+":

    @{$pets} = $1 if $pet_list =~ /$some_variable\:(.+?);?/;

    But as the ';' is optional, the non-greedy match would stop matching after the first non-newline character. In your example you would get "b" in $1. It's best not to use "." if you don't really mean "any non-newline character". In your case I think [^;] would be the right thing to use, given that I understood properly what you want to do.
Re: Pattern Match n00b
by jeroenes (Priest) on Mar 14, 2008 at 21:27 UTC
    You could go splitting instead. Nested splits on ; and : could work. Or you could do it all in one statement with SuperSplit:

    $array=supersplit(':',';',$mystring);

    provided that the (semi)colons are true delimiters.

    Cheers,
    Jeroen

    Couldn't resist the temptation after all those years ;-)