in reply to combined into a single regex

Not entirely sure I understand your question, but assuming you are trying to eliminate the char_test stuff, try this...

my @data = (-123.004,-.008,0,-0,.0987,1.0,12345,'d','test'); foreach my $value (@data){ if ( $value =~ /^-?(\d?|\d+)\.?(\d?|\d+)$/ ){ print "true\n"; } else{ print "false\n"; } }

Replies are listed 'Best First'.
Re^2: combined into a single regex
by Skeeve (Parson) on Dec 27, 2005 at 06:46 UTC

    I think, this would be equivalent:
    /^-?(\d*)\.?(\d*)$/
    OTOH: I wouldn't consider "-." a legal number.

    So maybe this is better?
    /^-?((\d+\.\d*)|(\.?\d+))$/
    This wouldn't capture like yours does, but the captured parts aren't used. So you might want to consider using (?:...)...


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Re^2: combined into a single regex
by doctor_moron (Scribe) on Dec 27, 2005 at 09:37 UTC

    I made a litle change youre regexp and tried to analyze

    my @data = (-123.004,-.008,0,-0,.0987,1.1,12345,'d','test'); foreach my $value (@data){ if ( $value =~ /(^-?)(\d?|\d+)(\.?)(\d?|\d+)$/){ print "$1 - $2 - $3 - $4 true\n"; } else{ print "false\n"; } }

    For this analyze i ignored the process of $1 - first grouping, i think i can understand that, so we straight to second grouping

    For the $value = 123.004
    1. Move on to second group and pick the first alternative (\d?)
    2. "123" doesnt match (\d?), since (\d?) = match digit 1 or 0 times
    3. Backtrack 1 character.
    4. Pick the second alternative in the second group (\d+) or match digit 1 or more times, and it match, so we got "123" for $2
    5. Move on to the third group, (\.?) match "." 1 or 0 times, so we got "." for $3
    6. Move on to the fourth group, first alternative doesnt match so backtrack 1 character, try the second alternative and match "004", so we got "004" for $4.

    for the $value = 12345
    1. Move on to the second group and pick the first alternative (\d?)
    2. "12345" doesnt match (\d?)
    3. backtrack 1 character
    4. Pick the second alternative in the second group (\d+) or match digit 1 or more times, and we got "1" for $2, i am not sure about this, i keep thinking that we should get "12345" for $2 (is there something in third grouping (\.?) ?)
    5. Move on to the third group, "2345" doesnt match (\.?)
    6. Move on to the fourth group, "2345" doesnt match (\d?), (\d?) only match "2" in "2345".
    7. backtrack 1 character, try the second alternative (\d+) and "2345" match (\d+) or match digit 1 or more times.

    sorry for my english, zak

      One thing you are missing is that (\d?) doesn't fail and backtrack as you describe; it succeeds in matching the "1" in both cases and then applies the rest of the regex (\.?)(\d?|\d+)$ to what comes after the "1". Only if that rest of the regex fails will it try first having \d? match 0 digits and applying the rest of the regex to the whole string and then the \d+ alternative, matching first as many digits as possible, then successively fewer until the end of the regex matches.

      But for "12345", it does almost no backtracking; \d? matches the "1", \.? doesn't match once but succeeds at matching 0 times, the second \d? matches the "2", but the $ doesn't match so \d? tries matching 0 times, $ still doesn't match, so the second \d+ is tried, matching "2345", and then $ matches the end of string.

      (\d?|\d+) is a very strange construct; it says to try matching in this order: 1 digit, 0 digits, N digits, N-1 digits, N-2 digits, ..., 2 digits. I can't believe that's really what you want. Do you want something as simple as: <c>/^(-?)(\d+)(\.?)(\d*)$/<c>

        Do you want something as simple as: /^(-?)(\d+)(\.?)(\d*)$/

        That's true, i agree, i am just curious of what perl does when it tries to match the regexp :

        (^-?)(\d?|\d+)(\.?)(\d?|\d+)$/ #on -123.004

        Because of my poor English, i am afraid i just misinterpreted your explanation (again).

        So i asked for help to ID-PERL and i found the answers from Jacinta and pope (pope introduced re 'debug', i just need more time to understand about this).

        I saved the conversation between me and jarich in my pad for my own note.

        Here we go,

        (^-?)(\d?|\d+)(\.?)(\d?|\d+)$/ #on -123.004

        STEP BY STEP ANALYSIS

        1. Match "-" with (^-?) so we set $1 to -

        2 Move on to 2nd group, and pick the 1st alternative and i think its \d (NOT \d?). \d will then match the digit 1. In other words, i can say that :
        \d? : we try to match a single digit, if we can't, we'll try to match no digits

        3. Move on to the 3rd group, and pick the 1st alternative,\.?, try to match a dot, fail to find one, so choose the zero dots option. So \3 is undefined.

        4. Move on to the 4th group,first check it with \d, we match 2 with \d, move on to the next requirement which then says we must be at the end of string. since we're *not* at the end of the string we have to use a different alternative. We have 2 options remaining before further backtracking :
        - try matching *no* digits
        - try matching one or more digits

        5. We are not yet going to 2nd alternative, we're going to try \d?, we match \d? with 2, and too bad 2 is not at the end of the string, so 2 doesnt match with 1st alternative in the 4th group.

        6. Now we're going to 2nd alternative in 4th group, \d+, match 2 followed by 3, too bad its still not at the end of the string. So we're going to back to step 2, or to the 2nd alternative in 2nd group.

        Strictly speaking, this is the 3rd alternative that is tried, although the second part of the alternation

        7. Move on to 2nd group, try the 2nd alternative, \d+, match "1" followed by "2" followed by "3", set $2 to 123

        Actually first it'll try matching no digits, then the optional dot, then (\d?|+d) then find it can't get to the end of the string... after a while it'll come back and try the \d+ at the start and finally get somewhere.

        So we set $2 to 123, i'll stop here, about $3 and $4 i think it's easier to understand.

        Thanks, zak

Re^2: combined into a single regex
by arcnon (Monk) on Dec 27, 2005 at 04:19 UTC
    thanks thats it.