FFSparky has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull a group of 6 to 8 digits our of a string. Yet when I run my sample code below I only get the last 6 digits "345678" where I would expect to get all 8 "12345678"

$String = "sdalfkjsdlfjasl;kjsdflkj12345678dskajhjkhddsfkjhh"; print "\n\n\$String = ".$String."\n\n"; ($Regex1) = ($String =~ /.*(\d{6,8}).*/); print "\$Regex1 = ".$Regex1."\n";<br> print "\nJust to check:\n\$1 = ".$1."\n"; print "\$2 = ".$2."\n";

Replies are listed 'Best First'.
Re: Regex Quantifiers
by kennethk (Abbot) on Jul 15, 2009 at 19:39 UTC
    Your problem is your initial greedy .* - that grabs as much as is can, leaving the minimum match for the next term. I don't follow why you would want .* (particularly since you don't use either a ^ or $ anchor). You will get your expected results with either

    ($Regex1) = ($String =~ /.*?(\d{6,8}).*/);

    or

    ($Regex1) = ($String =~ /(\d{6,8})/);

    See perlre or perlretut for details, in particular Matching repetitions.

    As a side note, please wrap code in <code> tags - see Markup in the Monastery for details.

Re: Regex Quantifiers
by toolic (Bishop) on Jul 15, 2009 at 19:40 UTC
    ($Regex1) = ($String =~ /.*(\d{6,8}).*/);
    I suspect your first .* is being too greedy and eating up the 1st 2 digits from your \d{6,8} range. This gives your 8 digits:
    ($Regex1) = ($String =~ /(\d{6,8})/);
Re: Regex Quantifiers
by NetWallah (Canon) on Jul 15, 2009 at 19:44 UTC
    The initial ".*" in $Regex1 is greedy, grabbing as much as it can.

    You can make it less greedy by adding "?" - this works:

    my ($Regex1) = ($String =~ /.*?(\d{6,8}).*/);
    Update: Multiple collisions on this reply!

         Potentia vobiscum ! (Si hoc legere scis nimium eruditionis habes)

Re: Regex Quantifiers
by AnomalousMonk (Archbishop) on Jul 15, 2009 at 19:45 UTC
    The first  .* in the regex  /.*(\d{6,8}).*/ immediately consumes all characters in the string. The regex then starts backtracking to try to find a match for 6 to 8 digits. It finds such a match in the 6 digits of  '345678'. The second  .* in the regex then tries to match zero or more of anything and, as one would expect, finds a match. The overall successful match then terminates.
Re: Regex Quantifiers
by Anonymous Monk on Jul 15, 2009 at 19:52 UTC
    If you're wondering what regex mean, try YAPE::Regex::Explain
    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/.*(\d{6,8}).*/ )->explain,"\n"; __END__ The regular expression: (?-imsx:.*(\d{6,8}).*) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d{6,8} digits (0-9) (between 6 and 8 times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    To see what is happening, try

    use re 'debug';

Re: Regex Quantifiers
by FFSparky (Acolyte) on Jul 15, 2009 at 20:23 UTC
    Everyone, Thank You for your Wisdom!

    I was being greedy trying to pattern the entire string as opposed to just what I wanted.

    And also for the additional reference links and debugging methods!

    Very much appreciated!
Re: Regex Quantifiers
by biohisham (Priest) on Jul 15, 2009 at 20:36 UTC
    do you know that when you are using the concatenation operators you are actually embedding into a string what is being concatenated to it and since Perl allows you to do string interpolation on the dereferenced variables you could well write this
    print "\n\n\$String = ".$String."\n\n";
    as this:
    print "\n\n\$String = $String\n\n";
    and the same thing goes for
    print "\$Regex1 = ".$Regex1."\n";
    and
    print "\nJust to check:\n\$1 = ".$1."\n"
    so try to be more organized in writing your code... I see an HTML line break in the body of your code !!!!
    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind