Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to play with the sample code for lookahead assertions in Mastering Regular Expressions. On pg. 230 neither of the regexes given for what looks to be matching integers works in all cases.
my $numbers="1 1.0 3.547 92.34 343.2234"; while($numbers=~/(\d+)(?![\d.])/g){ print "$1 is an integer\n"; } print "-"x20,"\n"; while($numbers=~/(\d+)(?=[^.\d])/g){ print "$1 is an integer\n"; }
Results:
1 is an integer
0 is an integer
547 is an integer
34 is an integer
2234 is an integer
--------------------
1 is an integer
0 is an integer
547 is an integer
34 is an integer

Clearly I don't want to capture things that follow a decimal point. I tried prepending (^|[^.\d]) The front of each of these yet it didn't seem to help only made the results even more bizarre. Any suggestions?

Replies are listed 'Best First'.
Re: How can I match all the integers in a string?
by danger (Priest) on Mar 04, 2001 at 12:50 UTC

    Well, MRE doesn't say that those regexen are to be used to find integers -- in fact, it states: 'We don't know what these might be used for ...'.

    However, if you do want to find (and extract) just integers you could employ both negative look-ahead, and negative look-behind (lookhind wasn't available when MRE was written):

    my $numbers="1 1.0 3.547 123 92.34 343.2234"; while($numbers=~/(?<![\d.])(\d+)(?![\d.])/g){ print "$1 is an integer\n"; }

    That says: match a bunch of digits that are not preceded by a digit or a dot, and are not followed by a digit or a dot (there be other particular exceptions that I'm not considering at the moment such as grabbing any leading + or - sign if present, but this gives you something to play with).

    Update: Ahh, and an important exception I neglected is caught by mirod's version: namely, an integer could occur at the end of a sentence and be legitimately followed by a dot, thus we may want to allow for a trailing dot as long as it isn't followed by digits by changing the negative look-ahead above to the one mirod uses: (?!\d*\.\d).

Re: How can I match all the integers in a string?
by mirod (Canon) on Mar 04, 2001 at 13:06 UTC

    Here is my take (my takes actually):

    #!/bin/perl -w use strict; my $numbers="1 1.0 3.547 92.34 12 .25 23.00 23.01 343.2234"; # I guess that's what you were looking for while($numbers=~/(?<![\.\d]) # first no digits or . (\d+) # digits (?!\d*\.\d) # then no digits, . and digits /gx) { print "$1 is an integer\n"; } print '-' x 20, "\n"; # the same thing slightly simpler while($numbers=~/(?:\A|[^\d\.]) # catches the character before a numb +er or the start of the string (\d+) # digits (?!\d*\.\d) # then no digits, . and digits /gx) { print "$1 is an integer\n"; } # much simpler I think while($numbers=~/([\d\.]+)/g) # catch all numbers { if( $1=~ /^(\d+)$/) # keep only integer ones { print "$1 is an integer\n"; } # this $1 from the if regexp } print '-' x 20, "\n"; # now maybe 1.0 is considered an integer while($numbers=~/(\d+(?:\.(\d+))?)/g) { my $nb= $1; # get all + numbers print "$nb is an integer\n" if( !$2 || ($2=~ /^0+$/) ); # keep th +ose with no decimal parts # or with + a deciaml part only made of 0's }

    gives:

    1 is an integer 12 is an integer -------------------- 1 is an integer 12 is an integer -------------------- 1 is an integer 12 is an integer -------------------- 1 is an integer 1.0 is an integer 12 is an integer 25 is an integer 23.00 is an integer
Re: How can I match all the integers in a string?
by dvergin (Monsignor) on Mar 04, 2001 at 12:23 UTC
    You were on the right track. Try this:
    my $numbers="1 1.0 3.547 92.34 343.2234"; while($numbers =~ /(^|[^.\d])(\d+)/g) { print "$2 is an integer\n"; }
    Which prints out:
    1 is an integer 1 is an integer 3 is an integer 92 is an integer 343 is an integer
    For the data you give,    /(^| )(\d+)/g    also works fine.
    Sorry... no look-ahead.  %-(
Re: How can I match all the integers in a string?
by japhy (Canon) on Mar 04, 2001 at 23:06 UTC
    I'd use the cut operator here. The cut operator stops backtracking from occurring:
    $_ = "1 1.0 3.547 92.34 343.2234"; while (/(?<![\d.])((?>\d+))(?!\.\d)/g) { print "$1\n"; }
    The OGRE would say (I'm paraphrasing): "make sure we're not preceded by a digit or a period, then, saving to $1, match (without backtracking) one or more digits, and then make sure we can't match a period and a digit." If we removed the cut operator, that \d+ could backtrack to say that in "123.456", the "12" part is found as an integer, which would be a false positive.

    If you want, like mirod did, to match 1.0 as an integer, you would change this to:

    /(?<![\d.])((?>\d+(?:\.0+)?))(?!\.\d)/g


    japhy -- Perl and Regex Hacker