in reply to Matching numbers by regex.

* is greedy - it matches as many characters as it can, but it can match none at all. In (\d+).*(\d*\d+) the \d* is redundant (the following \d+ matches at least 1 digit and as many as it may) and the .* before it matches as many charactes as it can including all except one digit (the \d+ grabs one digit). One way to fix the problem is:

use strict; use warnings; my $data = "Exlief 4 page : 1 /10"; my $match = qr/pag\w+\s*:\s*(\d+)[^\d]*(\d+)/; print "Pages : $1 / $2\n" if $data =~ $match; $data = "Exlief 4 page : 1 / 5"; print "Pages : $1 / $2\n" if $data =~ $match;

Prints:

Pages : 1 / 10 Pages : 1 / 5

Note that a precompiled regex is used to save retyping (perhaps differently) the regex and that the 'match any character' has been replaced by 'match any character except a digit' and that the redundant digit match has been removed.


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Matching numbers by regex. (remember \D)
by grinder (Bishop) on Apr 19, 2006 at 10:26 UTC

    In the above code (and in the other replies in the thread),

    [^\d]*

    may be represented with

    \D*

    and will be more efficient as well, since it avoids calls to utf8::IsDigit internally.

    • another intruder with the mooring in the heart of the Perl

      Heh, good point! I do tend to forget the uppercase versions of the character set match flags such as \D \W \S. Thanks for the reminder.


      DWIM is Perl's answer to Gödel