orange has asked for the wisdom of the Perl Monks concerning the following question:

hello
i have a problem with this regex:
X[^X]*?H[^H]*?D
applied to RXVXCHHHZHDT
it is matched to XCHHHZHD within it. and i intend it to not matching such a string. as i want X and then no X's until the first H and then no H's until D.
ie, it must match the string RXVXCHBNDT but not RXVXCHHHZHDT
can some one tell me what is wrong with my regex
regards

Replies are listed 'Best First'.
Re: Regex logic
by Corion (Patriarch) on May 05, 2008 at 09:08 UTC

    So, when you say X and then no X's until the first H, you actually mean X and then no X's and no H until the first H?

    X[^XH]*?H[^H]*?D

    You didn't supply a program that I can use to test your cases, so I wrote one myself:

    perl -lne 'print $1 if /(X[^XH]*?H[^H]*?D)/'

    ... and for the three test cases, it seems to work.

    (Perl) Regular expressions don't have an implicit notion of "first element". If one element does not match, they will try alternation until they find a combination that matches.

Re: Regex logic
by ikegami (Patriarch) on May 05, 2008 at 09:09 UTC
    You did "until an H" instead of "until the first H". Fix:
    X[^XH]*H[^HD]*D

    By the way, the non greedy modifier is deceptively complex and you'll have better luck by avoiding it.

Re: Regex logic
by moritz (Cardinal) on May 05, 2008 at 09:12 UTC
    Your regex matches because [^X] also matches H. Use [^XH] instead:
    use strict; use warnings; use Test::More tests => 3; my $re = qr{X[^XH]*?H[^HD]*?D}; my $str = "RXVXCHHHZHDT"; unlike("RXVXCHHHZHDT", $re); like( "RXVXCHBNDT", $re); unlike("RXVXCHHHZHDT", $re);