bgu has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to extract "1", "2","17" and "3"
$_ = "1.2.17.3 Glückwunschschreiben 12 und"; if ( /(?:(\d+)\.)+(\d+)/i ) { print "$1 $2 $3 $4\n"; }
but it only returns the last two, "17" and "3"....

Replies are listed 'Best First'.
Re: What's wrong with this regex
by moritz (Cardinal) on Mar 01, 2012 at 15:22 UTC

    Since you have written only two pairs of (capturing) parentheses, you only get two match variables. A quantified capture stores only the string of the last match.

    So either make the outer parenthesis capture, and use split to extract the numbers from $1, or write out four pairs of capturing parenthesis.

    Note that this is more convenient in Perl 6: a quantified capture records a list of all matches.

      Thanks for the quick answer guys.

      Unfortunately four groups is not going to help, it should match anthing from 1.2 to 1.2.3.4.5.6 etc, or anything follows this structure.

      To look for a substring matching  [\d+\.]+ is also not an option, as there may be other matches in the string, like a simple number.

        Sorry, I don't understand your answer. What exactly is wrong with the approach that uses split?

        $_ = '1.2.3.4.5.6 foo bar'; if (/((?:\d+\.)+(\d+)/) { my @matches = (split(/\./, $1), $2) }
        I'm thinking you don't need a beefy regex if that's all you want to do. How about something like this instead:
        my $string = '12.34.56.78 asdfasdfasdfasdfasdf'; my $numbers = shift @{[split ' ', $string]}; my @numbers = split '\.', $numbers; use Data::Dumper; print Dumper(@numbers) if $numbers =~ /^[\d.]+$/;

        EDIT: Whoops, sorry moritz - I confess to only skimming your answer and didn't notice that you had already suggested using split. Apologies!

Re: What's wrong with this regex
by toolic (Bishop) on Mar 01, 2012 at 15:21 UTC
    Tip #9 from the Basic debugging checklist: Demystify regular expressions by installing and using the CPAN module YAPE::Regex::Explain
    (?i-msx:(?:(\d+)\.)+(\d+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?i-msx: group, but do not capture (case-insensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Here is one way to get what you want:

    use warnings; use strict; $_ = "1.2.17.3 Glückwunschschreiben 12 und"; if ( /(\d+)\.(\d+)\.(\d+)\.(\d+)/ ) { print "$1 $2 $3 $4\n"; } __END__ 1 2 17 3
Re: What's wrong with this regex
by JavaFan (Canon) on Mar 01, 2012 at 16:34 UTC
    $_ = "1.2.17.3 Glückwunschschreiben 12 und"; say for /\G([0-9]+)[. ]/g; __END__ 1 2 17 3
      good stuff, it works!