Driving the Engine right off the road

Intrepid has asked for the wisdom of the Perl Monks concerning the following question:

The Perl Regexp Engine of course. I have been working hard on working out how to formulate a regular expression with a negative lookbehind and I cannot get it right. What I have right now is this: /^(?<!XDG_) [_A-Z0-9]+ _PATH$/x

I'll explain in english what I need. There will be desired matches with strings like PKG_CONFIG_PATH but I must not match XDG_SEAT_PATH or XDG_SESSION_PATH. Naturally, it is not as simple as avoiding any match with a string beginning with XDG_ because I want to match XDG_DATA_DIRS, for example. (Yes, these are well-recognized environment variables).

This works for getting only PATH: /^(?<!XDG_)PATH$/, so I can see that the Engine does understand the lookbehind. Any suggestions will be appreciated!

Aug 22, 2025 at 13:49 UTC

A just machine to make big decisions
Programmed by fellows (and gals) with compassion and vision
We'll be clean when their work is done
We'll be eternally free yes, and eternally young
Donald Fagen —> I.G.Y.
(Slightly modified for inclusiveness)

Comment on Driving the Engine right off the road Select or Download Code

Replies are listed 'Best First'.
Re: Driving the Engine right off the road by hippo (Archbishop) on Aug 22, 2025 at 18:48 UTC
I'll explain in english what I need. That can be useful but it's fuzzy. Far better to write tests which may or may not work. See How to ask better questions using Test::More and sample data If you can negate the logic then (from your fuzzy description) I think it should be quite simple: `use strict; use warnings; use Test::More; my @good = qw/PKG_CONFIG_PATH XDG_DATA_DIRS/; my @bad = qw/XDG_SEAT_PATH XDG_SESSION_PATH/; plan tests => @good + @bad; my $re = qr/^XDG.*PATH$/; for my $str (@good) { unlike $str, $re, "good $str does not match"; } for my $str (@bad) { like $str, $re, "bad $str matches"; }` [download] 🦛	[reply] [d/l]
Re^2: Driving the Engine right off the road by Intrepid (Curate) on Aug 22, 2025 at 20:42 UTC
hippo suggested: See How to ask better questions using Test::More and sample data Huh, I never thought of anything like that for writing-up questions for PMo. Very good suggestion, I'll try that mode in the future. – Soren Aug 22, 2025 at 20:41 UTC	[reply]
Re: Driving the Engine right off the road by tybalt89 (Monsignor) on Aug 22, 2025 at 20:55 UTC
`/^(?<!XDG_) [_A-Z0-9]+ _PATH$/x` does not make sense to me, you are trying to look before the beginning of the string. How about: `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11166082 use warnings; my @patterns = qw/PKG_CONFIG_PATH XDG_DATA_DIRS XDG_SEAT_PATH XDG_SESS +ION_PATH/; my $re = qr/^(?!XDG_ [_A-Z0-0]+ _PATH$)/x; for ( @patterns ) { print /$re/ ? " match" : "no match", " $_\n"; }` [download] Outputs: `match PKG_CONFIG_PATH match XDG_DATA_DIRS no match XDG_SEAT_PATH no match XDG_SESSION_PATH` [download]	[reply] [d/l] [select]
Re^2: Driving the Engine right off the road by ysth (Canon) on Aug 24, 2025 at 14:31 UTC
That will also match "bob", which doesn't sound desired.	[reply]
Re^3: Driving the Engine right off the road by tybalt89 (Monsignor) on Aug 24, 2025 at 16:47 UTC
It correctly performs on all the provided test cases. It's not my problem if an inadequate number of test cases are provided :)	[reply]
Re^4: Driving the Engine right off the road by ysth (Canon) on Aug 24, 2025 at 22:09 UTC
Re: Driving the Engine right off the road by ysth (Canon) on Aug 24, 2025 at 14:47 UTC
Lookahead/behind are zero-width assertions; you are asserting that after matching ^ (beginning of string), what comes just before that can't be XDG_, and then continuing to match the rest of your regex. There are two problems here: first, the beginning of the string will never follow XDG_, so your assertion will always pass; and second, you will only match if the string ends with _PATH or _PATH followed by a newline, which doesn't meet your requirements. If perl supported variable length lookbehind, you could use a lookbehind at the end: `/^[_A-Z0-9]+$(?<!^XDG_._PATH)/` [download] but it doesn't. So use a lookahead instead, to test that the entire string is the expected characters but doesn't match your exclusion at the beginning: `/^(?!XDG_._PATH$)[_A-Z0-9]+$/` [download] or (and this is almost always the more readable approach), do it in code: `/^[_A-Z0-9]+$/ && ! /^XDG_.*_PATH$/` [download]	[reply] [d/l] [select]
Re^2: Driving the Engine right off the road by swl (Prior) on Aug 24, 2025 at 22:53 UTC
If perl supported variable length lookbehind, you could use a lookbehind at the end: Just a data point but Perl has limited variable length lookbehind as an experimental feature since version 5.30, with support for up to 255 characters. https://perldoc.perl.org/perlexperiment#(Limited)-Variable-length-look-behind	[reply]
Re^3: Driving the Engine right off the road by ysth (Canon) on Aug 25, 2025 at 22:29 UTC
See also https://perldoc.perl.org/perl5360delta#Variable-length-lookbehind-is-mostly-no-longer-considered-experimental I do wish it could work in even more cases, like where the beginning point is known (e.g. `(?<=\A...)` or with `\G`) or something like `(?<=...\1...)` where the length is known when running that part of the regex even if not when compiling it.	[reply] [d/l] [select]
Re^2: Driving the Engine right off the road by ysth (Canon) on Aug 24, 2025 at 14:58 UTC
Note that I kept your `$` but I hate it, because without `/m` it is rarely used to intentionally do what it actually does (match either the end of the string or before a newline at the end of a string). If your data has no newlines, use `\z` (match only at the end of the string). If your data ends with newlines, use `\n` (or `\n\z` if your string may have internal newlines that should not match). Then your code doesn't mislead about what the data looks like.	[reply] [d/l] [select]
Re^3: Driving the Engine right off the road by Intrepid (Curate) on Aug 24, 2025 at 21:55 UTC
It's weird. I lost some good habits when I took my 10-year break from programming, and using \z instead of $ in regexen is one of them. Since you pointed it out, I think my fingers will do the right thing from now on. In this case it wouldn't matter because my strings are all keys from `%ENV` and would never contain an embedded newline. But regaining (and retaining) good coding habits is important. – Soren never	[reply]