diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

I recently tried to write the look-behind expression (?<=(?:199\d|200\d|\D)) but that results in the error Variable length lookbehind not implemented before HERE mark in regex m/(?<=(?:199\d|200\d|\D)) << HERE /. I got it to work by prepending three dots to ...\D but since I wasn't using any variable quantifiers (question mark, plus, asterisk, etc) I think it should have worked in the first form. So what's the deal here? Even though every component of the alternation was of a known length, because the alternation itself was variable length this didn't work?

Replies are listed 'Best First'.
Re: Not-really variable length lookbehind
by hv (Prior) on Apr 26, 2003 at 05:13 UTC

    Before lookbehind assertions were invented, there already existed code to calculate the minimum and maximum possible matching lengths of a subpattern, used by the optimiser (eg to bail out of the match before starting when the target string is too short). When support for lookbehind was added, the check was added in the simplest possible way: check that the minimum and maximum possible match lengths are the same.

    The restriction could be relaxed in several ways given some slightly more intelligent code support. Your example of alternation is one such, and another one is backreferences (since their length is always fixed by the time it is needed):

    "aabbaababb" =~ /(a+).*(?<=\1)b/; # should match "aabbaab"

    The regular expression engine is due for a bit of an overhaul during the development of perl-5.10.0, and we may see some improvements in this area as a part of that if they can be fitted in without slowing down patterns that don't use lookbehinds.

    For your example, prepending dots may be inaccurate if the non-digits could be near the beggining of the string. I'd suggest moving the alternation outwards instead:

    (?:(?<=199\d|200\d)|(?<=\D))

    Hugo
Re: Not-really variable length lookbehind
by artist (Parson) on Apr 26, 2003 at 05:26 UTC
    Looks like that you have the correct observation. Lookbehind should have constant length item. If alterations have variable lengths, the 'lookbehind' operator in your case consider 'alterations' as variable length item.
    For exmaple if you think that
    A
    /(?<=(?:\d{1,2}))/
    should give you an error, than
    B
    /(?<=(?:\d\d|\d))/;
    will also you error since it's same as 'A'.
    In other words, it's really variable length look behind.

    artist

Re: Not-really variable length lookbehind
by Juerd (Abbot) on Apr 26, 2003 at 12:06 UTC

    You probably thought that (?<=199\d|200\d|\D) (note: the (?:) is useless here) would do something like (?:(?<=199\d)|(?<=200\d)|(?<=\D)), but that is not how it works. In pseudo-code, this is what the regex engine does:

    # (?<=199\d|200\d|\D) sub lookbehind { my $subregex = shift; # qr/199\d|200\d|\D/; local pos() = pos() - matchlength($subregex); return /$subregex/; }
    But here, the imaginary matchlength function can't return a single value, because the subregex can match strings of different lengths. So in your case, the engine doesn't know how many characters to go back before trying the look-behind.

    Juerd
    - http://juerd.nl/
    - spamcollector_perlmonks@juerd.nl (do not use).