in reply to Can you do "conjunctive" (overlapping) conditions in a single regexp?

perlre describes 2 experimental code features that can do this. Albeit in an ugly way:
@x = split /(?{ # This populates $^R pos() })some pattern(??{ # This returns an impossible pattern for long matches (pos()-$^R < 5) ? "" : "no\\bmatch" })/, $string;
Be warned though that this will backtrack and try to match again. For instance if you want to not match more than 4 ,'s, this will match 4 out of 6 commas, then will match 2 commas next time. So instead of preserving 6 commas you'll eat them up and get a blank field.

If this is not what you want then you have to do a lot more work. You have to make sure that it fails on all of the backtracks as well. This means that when you fail you need to remember the failure and fail every time you see that position. Like this:

# Be sure that these start clean! my %bad_r; my %bad_pos; @x = split /(?{ # This populates $^R pos() })some pattern(??{ # This returns an impossible pattern for long matches if (pos()-$^R < 3 and not $bad_r{$^R} and not $bad_pos{pos()} ) { ""; } else { $bad_r{$^R}++; $bad_pos{ pos() }++; "no\\bmatch" } })/, $string;
That is ugly! But it should work.

Update: Well it should if I had not missed the closing paren on the pos(). Thanks to ikegami for catching that.

Replies are listed 'Best First'.
Re^2: Can you do "conjunctive" (overlapping) conditions in a single regexp?
by ikegami (Patriarch) on Dec 18, 2008 at 03:08 UTC

    Nice. The first snippet is exactly what I would have done, except I'd replace
    (??{ (pos()-$^R < 5) ? "" : "no\\bmatch" })
    with
    (?(?{ pos()-$^R >= 5 })(?!))
    to save from compiling patterns repeatedly.

      I missed that possibility in the documentation. That would be more efficient.
Re^2: Can you do "conjunctive" (overlapping) conditions in a single regexp?
by puterboy (Scribe) on Dec 18, 2008 at 03:46 UTC
    Thanks & very clever (probably too clever for my little brain ;)- I'm trying though to see whether I can use case #1, rather than your much hairier second example.

    I actually want to preserve the field and keep it with the split, so prior to adding the length limitation, I was using:

    split /(?=$regex)/, $string
    And I want to get only the greediest match (that satisfies the length condition). So, does that mean I can't use first alternative then?
      You can't use the first version.

      You can use the second version like you did before. Just put the (?=) around the whole thing and insert whatever you want for the pattern in the middle.

      It is ugly but conceptually is not that bad. The first code pattern stores the position of the start of the match in $^R. The second one looks at the current position and the start (which is in $^R) and decides whether or not to make the match fail by interpolating in something that can't match. There are some complication around the logic, but that doesn't need to change.

        Thanks so much for the help & explanation!!!