in reply to bug in regexp engine?

You must remember that (?=...) is a zero-width assertion. Thus (\d{3})(?=(\d{3})+)$ tries to first match three digits, then sees if the text after where it is in the string matches (\d{3})+ without moving further in the string and then sees it can match an end of string or newline after its current location in the string.

Replies are listed 'Best First'.
Re: Re: bug in regexp engine?
by stefp (Vicar) on Sep 29, 2001 at 22:44 UTC
    wog is right. My mistake was to put the right anchor outside the zero width look-ahead assertion. The correct substitution code is:

     s/(\d{1,3}?)(?=(\d{3})+$)/$1_/g;

    The lookahead makes sure that the number of digits before each underscore we insert is a multiple of 3

    The lookahead: (?=(\d{3})+$)
    I needed an extra set of parenthesis to fool Perl because the regexp parser barks if there is two quantifiers in a row, which is perfectly legitimate here.

    There is a general lesson to be learned here: unchecked idiotism are for idiots. So much for me :)

    When dealing with new material (here regexp assertion that I have not used much), one must learn to reassess idiotisms that may not work in a new larger context. Here, I used the idotism: force the match to the end of the string => add a $ at the the very end of the regexp. It did not work here because I wanted the lookahead to match to the end of the string.

    Compare the previous code with the easy way

    -- stefp