stefp has asked for the wisdom of the Perl Monks concerning the following question:

I was wanting to expand my answer to Splitting every 3 digits? and do everything in one substitute statement.

I restate the problem (different from the initial one): writing an integer by block of three digits separated by underscores. The custom is to start from the end because the goal is to make visible thousands, millions and so on.

The easy way:

$_=1113333444455; $_=reverse; s/(\d{3})/$1_/g; $_=reverse; s/^_//; print;
I thought that would do it in one substitute statement: s/(\d{3})(?=(\d{3})+)$/$1_/g; No way.

So I simplify the statement and instrument it with a zero witdh assertion:

DB<17> $_=1113333444455; m/(\d{3})(?{ print $1 })(?=\d{3})$/ 455
It acts like if the right anchor were just behind the (\d{3}). Weird. Or I just need some sleep.
I am using perl 5.6.1

-- stefp

Replies are listed 'Best First'.
Re: bug in regexp engine?
by wog (Curate) on Sep 29, 2001 at 06:22 UTC
    You must remember that (?=...) is a zero-width assertion. Thus (\d{3})(?=(\d{3})+)$ tries to first match three digits, then sees if the text after where it is in the string matches (\d{3})+ without moving further in the string and then sees it can match an end of string or newline after its current location in the string.
      wog is right. My mistake was to put the right anchor outside the zero width look-ahead assertion. The correct substitution code is:

       s/(\d{1,3}?)(?=(\d{3})+$)/$1_/g;

      The lookahead makes sure that the number of digits before each underscore we insert is a multiple of 3

      The lookahead: (?=(\d{3})+$)
      I needed an extra set of parenthesis to fool Perl because the regexp parser barks if there is two quantifiers in a row, which is perfectly legitimate here.

      There is a general lesson to be learned here: unchecked idiotism are for idiots. So much for me :)

      When dealing with new material (here regexp assertion that I have not used much), one must learn to reassess idiotisms that may not work in a new larger context. Here, I used the idotism: force the match to the end of the string => add a $ at the the very end of the regexp. It did not work here because I wanted the lookahead to match to the end of the string.

      Compare the previous code with the easy way

      -- stefp