Intrepid has asked for the wisdom of the Perl Monks concerning the following question:

The Perl Regexp Engine of course. I have been working hard on working out how to formulate a regular expression with a negative lookbehind and I cannot get it right. What I have right now is this: /^(?<!XDG_) [_A-Z0-9]+ _PATH$/x

I'll explain in english what I need. There will be desired matches with strings like PKG_CONFIG_PATH but I must not match XDG_SEAT_PATH or XDG_SESSION_PATH. Naturally, it is not as simple as avoiding any match with a string beginning with XDG_ because I want to match XDG_DATA_DIRS, for example. (Yes, these are well-recognized environment variables).

This works for getting only PATH: /^(?<!XDG_)PATH$/, so I can see that the Engine does understand the lookbehind. Any suggestions will be appreciated!

Aug 22, 2025 at 13:49 UTC

A just machine to make big decisions
Programmed by fellows (and gals) with compassion and vision
We'll be clean when their work is done
We'll be eternally free yes, and eternally young
Donald Fagen —> I.G.Y.
(Slightly modified for inclusiveness)

Replies are listed 'Best First'.
Re: Driving the Engine right off the road
by hippo (Archbishop) on Aug 22, 2025 at 18:48 UTC
    I'll explain in english what I need.

    That can be useful but it's fuzzy. Far better to write tests which may or may not work. See How to ask better questions using Test::More and sample data

    If you can negate the logic then (from your fuzzy description) I think it should be quite simple:

    use strict; use warnings; use Test::More; my @good = qw/PKG_CONFIG_PATH XDG_DATA_DIRS/; my @bad = qw/XDG_SEAT_PATH XDG_SESSION_PATH/; plan tests => @good + @bad; my $re = qr/^XDG.*PATH$/; for my $str (@good) { unlike $str, $re, "good $str does not match"; } for my $str (@bad) { like $str, $re, "bad $str matches"; }

    🦛

      hippo suggested:
      See How to ask better questions using Test::More and sample data

      Huh, I never thought of anything like that for writing-up questions for PMo. Very good suggestion, I'll try that mode in the future.

          – Soren

      Aug 22, 2025 at 20:41 UTC

Re: Driving the Engine right off the road
by tybalt89 (Monsignor) on Aug 22, 2025 at 20:55 UTC

    /^(?<!XDG_) [_A-Z0-9]+ _PATH$/x does not make sense to me, you are trying to look before the beginning of the string.

    How about:

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11166082 use warnings; my @patterns = qw/PKG_CONFIG_PATH XDG_DATA_DIRS XDG_SEAT_PATH XDG_SESS +ION_PATH/; my $re = qr/^(?!XDG_ [_A-Z0-0]+ _PATH$)/x; for ( @patterns ) { print /$re/ ? " match" : "no match", " $_\n"; }

    Outputs:

    match PKG_CONFIG_PATH match XDG_DATA_DIRS no match XDG_SEAT_PATH no match XDG_SESSION_PATH
      That will also match "bob", which doesn't sound desired.

        It correctly performs on all the provided test cases. It's not my problem if an inadequate number of test cases are provided :)

Re: Driving the Engine right off the road
by ysth (Canon) on Aug 24, 2025 at 14:47 UTC
    Lookahead/behind are zero-width assertions; you are asserting that after matching ^ (beginning of string), what comes just before that can't be XDG_, and then continuing to match the rest of your regex. There are two problems here: first, the beginning of the string will never follow XDG_, so your assertion will always pass; and second, you will only match if the string ends with _PATH or _PATH followed by a newline, which doesn't meet your requirements.

    If perl supported variable length lookbehind, you could use a lookbehind at the end:
    /^[_A-Z0-9]+$(?<!^XDG_.*_PATH)/
    but it doesn't. So use a lookahead instead, to test that the entire string is the expected characters but doesn't match your exclusion at the beginning:
    /^(?!XDG_.*_PATH$)[_A-Z0-9]+$/
    or (and this is almost always the more readable approach), do it in code:
    /^[_A-Z0-9]+$/ && ! /^XDG_.*_PATH$/
      Note that I kept your $ but I hate it, because without /m it is rarely used to intentionally do what it actually does (match either the end of the string or before a newline at the end of a string). If your data has no newlines, use \z (match only at the end of the string). If your data ends with newlines, use \n (or \n\z if your string may have internal newlines that should not match). Then your code doesn't mislead about what the data looks like.

        It's weird. I lost some good habits when I took my 10-year break from programming, and using \z instead of $ in regexen is one of them. Since you pointed it out, I think my fingers will do the right thing from now on. In this case it wouldn't matter because my strings are all keys from %ENV and would never contain an embedded newline. But regaining (and retaining) good coding habits is important.

            – Soren

        never