loikiolki has asked for the wisdom of the Perl Monks concerning the following question:

I've got this regex that I'm trying to make work. Maybe I don't properly understand how negative look-behinds function. From what I understand in the docs, the following code (w/ regex):

$file = '_error' unless ($file =~ /\A(?<!compiled)a-zA-Z0-9]+\z/);

should do the same thing as this:

$file = '_error' if ( substr($file, 0, 8) eq 'compiled' or $file !~ /\Aa-zA-Z0-9]+\z/);

The code should set $file to '_error' if the variable is not all 'a-zA-Z0-9', or if the variable begins with the word 'compiled'. The second example works fine, but my regex doesn't. Any help?

Edited by planetscape - added code tags

Replies are listed 'Best First'.
Re: Failed regex: negative look-behinds
by ikegami (Patriarch) on Feb 07, 2006 at 05:58 UTC

    \A(?<!compiled) means "The start of the string, followed by {the previous 8 characters are not 'compiled'}." The start of the string is never going to be preceeded by 'compiled', so that's always going to match.

    substr($file, 0, 8) eq 'compiled' or $file !~ /\A[a-zA-Z0-9]+\z/
    can be written as
    $file =~ /\Acompiled/ or $file !~ /\A[a-zA-Z0-9]+\z/
    which can be written as
    $file =~ /\Acompiled/ or $file =~ /[^a-zA-Z0-9]/
    which can be written as
    $file =~ /\Acompiled|[^a-zA-Z0-9]/
    The final line reads as: "It's an error if the filename starts 'compiled', or if it contains unsafe characters."

    Update: I described \A(?<=compiled) instead of \A(?<!compiled). Fixed.

Re: Failed regex: negative look-behinds
by graff (Chancellor) on Feb 07, 2006 at 05:58 UTC
    Welcome to the Monastery! First lesson: put "<code>" and "</code>" tags around your perl snippets and data samples.

    As for the question itself, I could suggest a regex that would be more likely to work:

    $file = '_error' if ( $file =~ /(?:^compiled)|[^A-Za-z0-9]/ ); # updated: not "unless"
    That demonstrates the two uses of the caret: as a start-of-string anchor and as an inversion operator within a character-class (i.e. match any character not specified within the sqaure brackets). BTW, if you can accept underscores along with digits and letters, use "\w" in place of "A-Za-z0-9_".

    But all that doesn't explain why the negative look-behind didn't work the way you wanted. I'll have to think about that for a bit.

    (update: as indicated in the comment line above, my initial post had the logic inverted. The OP said "set $file to _error if it starts with "compiled" or if it contains any character that is not alphanumeric. Since these are the things my regex tests for, the assignment must happen if there's a match, not the other way around. As for why the look-behind doesn't work, I think ikegami has it -- the entire OP regex is anchored at start-of-string, so there's no place to "look-behind".)

Re: Failed regex: negative look-behinds
by prasadbabu (Prior) on Feb 07, 2006 at 05:58 UTC

    Here is one way to do it. Also take a look at perlre

    $file = '_error' if (($file !~ /^([a-zA-Z0-9])+$/) or ($file =~ /^compiled/));

    Prasad

Re: Failed regex: negative look-behinds
by chargrill (Parson) on Feb 07, 2006 at 06:00 UTC

    Thus spaketh the doc perlre

    "(?<!pattern)" A zero-width negative look-behind assertion. For exa +mple "/(?<!bar)foo/" matches any occurrence of "foo" that +does not follow "bar". Works only for fixed-width look-behind +.

    Therefore, /\A(?<!compiled)[a-zA-Z0-9]+\z/ matches any occurance of one or more alphanumeric at the end of a string that does not follow "compiled"

    You might also have an issue with operator associativity and/or precedence... from perlop, orhas much lower precedence (and left associativity) than eq...



    --chargrill
    $/ = q#(\w)# ; sub sig { print scalar reverse join ' ', @_ } + sig map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu +" );
      therefore, /\A(?<!compiled)[a-zA-Z0-9]+\z/ matches any occurance of one or more alphanumeric at the end of a string that does not follow "compiled"

      Which is exactly what I want to match, but the regex doesn't seem to match this any occurance of one or more alphanumeric at the end of a string that does not follow "compiled". *shrug*

        I think they're right up above, most notably ikegami and graff...

        the entire OP regex is anchored at start-of-string, so there's no place to "look-behind"



        --chargrill
        $/ = q#(\w)# ; sub sig { print scalar reverse join ' ', @_ } + sig map { s$\$/\$/$\$2\$1$g && $_ } split( ' ', ",erckha rlPe erthnoa stJu +" );