rovf has asked for the wisdom of the Perl Monks concerning the following question:

Recently, someone posted in a Perl forum a problem which essentially could be simplified to this:

We have a string consisting of digits and dots. The first and last characters are guaranteed to be digits, and there are at least two digits between every two dots. Example: '123.45.678.9' Problem: Turn this string into a list of decimal numbers, each number having exactly one digit after the decimal point. In our example, this would be the list (123.4, 5.6, 78.9)

The fellow tried to solve this using split and failed. I suggested a solution using a pattern //g, which is IMO the more natural way to solve the problem.

However, I found the task interesting enough to think whether there also exists a solution involving split. In this case, we don't have fields where to split the string - or, to be more precise, the split points are zero-length. Hence, I thought, this could be solved using a negative look-ahead assertion. So I tried this:

perl -lwe 'use strict; print(join("/",split(/(?!\.\d)/,"123.456.78.1" +)))'
The idea is that a split point is any point in the string which is preceeded by a dot followed by a digit. To my surprise, this resulted in
1/2/3./4/5/6./7/8./1
to be printed. Could someone kindly explain this result?

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: split on zero-length pattern
by tinita (Parson) on Nov 26, 2010 at 10:59 UTC
    perl -lwe 'use strict; print(join("/",split(/(?!\.\d)/,"123.456.78.1")))'
    The idea is that a split point is any point in the string which is preceeded by a dot followed by a digit.
    In your code you have a negative look-ahead assertion (?!\.\d).
    In the explanation below you talk about "a point in the string which is preceeded by". For something like that you need a positive look-behind assertion.
    Maybe you just mixed them up? Using (?<=\.\d) works as expected for me.
      Hmmm... a negative look-ahead looks backwards, isn't it? So I thought it should be the correct one. For example, in the string '12345.6789.0', the first split point should be after the 6, i.e. giving 1245.6 as first element. Hence my idea goes like this: To the *left* of the split point must be a period, followed by a digit. The regexp engine needs to look back, so I thought it is negative look-ahead. Did I misunderstand here the explanations in perlre?

      -- 
      Ronald Fischer <ynnor@mm.st>
        You are searching for "something preceeded by".
        First, that is something *positive*. Why do you want to use a a negative look-around? You are searching for something that is preceeded by, not for something that is *not* preceeded by.
        Second, a look-ahead looks *ahead* for the pattern specified (update: maybe better put: looks if the specified pattern is ahead of the look-ahead assertion). For every look-around in perlre there is a short example given. The example for negative look-behindahead is /foo(?!bar)/, saying "match a foo that is *not* *followed* by "bar". So in your code you said effectively "match anything that is not followed by a dot and a digit".