in reply to Re^2: combined into a single regex
in thread combined into a single regex

One thing you are missing is that (\d?) doesn't fail and backtrack as you describe; it succeeds in matching the "1" in both cases and then applies the rest of the regex (\.?)(\d?|\d+)$ to what comes after the "1". Only if that rest of the regex fails will it try first having \d? match 0 digits and applying the rest of the regex to the whole string and then the \d+ alternative, matching first as many digits as possible, then successively fewer until the end of the regex matches.

But for "12345", it does almost no backtracking; \d? matches the "1", \.? doesn't match once but succeeds at matching 0 times, the second \d? matches the "2", but the $ doesn't match so \d? tries matching 0 times, $ still doesn't match, so the second \d+ is tried, matching "2345", and then $ matches the end of string.

(\d?|\d+) is a very strange construct; it says to try matching in this order: 1 digit, 0 digits, N digits, N-1 digits, N-2 digits, ..., 2 digits. I can't believe that's really what you want. Do you want something as simple as: <c>/^(-?)(\d+)(\.?)(\d*)$/<c>

Replies are listed 'Best First'.
Re^4: combined into a single regex
by doctor_moron (Scribe) on Dec 30, 2005 at 13:00 UTC

    Do you want something as simple as: /^(-?)(\d+)(\.?)(\d*)$/

    That's true, i agree, i am just curious of what perl does when it tries to match the regexp :

    (^-?)(\d?|\d+)(\.?)(\d?|\d+)$/ #on -123.004

    Because of my poor English, i am afraid i just misinterpreted your explanation (again).

    So i asked for help to ID-PERL and i found the answers from Jacinta and pope (pope introduced re 'debug', i just need more time to understand about this).

    I saved the conversation between me and jarich in my pad for my own note.

    Here we go,

    (^-?)(\d?|\d+)(\.?)(\d?|\d+)$/ #on -123.004

    STEP BY STEP ANALYSIS

    1. Match "-" with (^-?) so we set $1 to -

    2 Move on to 2nd group, and pick the 1st alternative and i think its \d (NOT \d?). \d will then match the digit 1. In other words, i can say that :
    \d? : we try to match a single digit, if we can't, we'll try to match no digits

    3. Move on to the 3rd group, and pick the 1st alternative,\.?, try to match a dot, fail to find one, so choose the zero dots option. So \3 is undefined.

    4. Move on to the 4th group,first check it with \d, we match 2 with \d, move on to the next requirement which then says we must be at the end of string. since we're *not* at the end of the string we have to use a different alternative. We have 2 options remaining before further backtracking :
    - try matching *no* digits
    - try matching one or more digits

    5. We are not yet going to 2nd alternative, we're going to try \d?, we match \d? with 2, and too bad 2 is not at the end of the string, so 2 doesnt match with 1st alternative in the 4th group.

    6. Now we're going to 2nd alternative in 4th group, \d+, match 2 followed by 3, too bad its still not at the end of the string. So we're going to back to step 2, or to the 2nd alternative in 2nd group.

    Strictly speaking, this is the 3rd alternative that is tried, although the second part of the alternation

    7. Move on to 2nd group, try the 2nd alternative, \d+, match "1" followed by "2" followed by "3", set $2 to 123

    Actually first it'll try matching no digits, then the optional dot, then (\d?|+d) then find it can't get to the end of the string... after a while it'll come back and try the \d+ at the start and finally get somewhere.

    So we set $2 to 123, i'll stop here, about $3 and $4 i think it's easier to understand.

    Thanks, zak