ady has asked for the wisdom of the Perl Monks concerning the following question:

How to match a word starting with 'N' and not ending with '00' ?

^N.*(?!00$) matches NZ31200 ^N.*?(?!00$) matches N (of NZ31200) so ??
-- allan

Replies are listed 'Best First'.
Re: ^N.*(?!00$)
by Anonymous Monk on Jun 06, 2005 at 12:00 UTC

    Use a look-behind, not a look-ahead:

    print "matches\n" if (/^N.*(?<!00)$/)
Re: ^N.*(?!00$)
by tlm (Prior) on Jun 06, 2005 at 12:16 UTC

    You were very close. This will do what you want:

    /^N(?!.*00$).*$/
    If you want to capture the match, then use $1 if the following evaluates to true
    /^(N(?!.*00$).*)$/

    Update: If we had ham, we could have ham and eggs, if we had eggs.

    the lowliest monk

Re: ^N.*(?!00$)
by mikeraz (Friar) on Jun 06, 2005 at 12:33 UTC

    The easiest on my eyes regex is  ( /N.*/ and !/00$/ ) but that will match things you don't want.
     
    Your code example tests for lines, not words, that match your description. For a more generalized, ie embedded in a line, case this should do nicely:
        ( /(^|\W*)(N.*?)(\W|$)/ and ! ($2 =~ /00$/) )
    which will match your test cases on lines by themselves or embedded in longer lines. But not in longer strings! :)

    Update

    Based on what I learned from tlm's post I'll amend what I suggested to:
        /(\W|^)N(?!.*00(\W|$)).*(\W|$)/

    Be Appropriate && Follow Your Curiosity
Re: ^N.*(?!00$)
by monarch (Priest) on Jun 06, 2005 at 11:50 UTC
    If the word is guaranteed to be 3 characters or longer you can try:
    /^N.*?[^0][^0]$/

    Embaressingly enough I provided an incorrect answer.

Re: ^N.*(?!00$)
by ady (Deacon) on Jun 06, 2005 at 13:46 UTC
    Thanks fellow monks!

    Solutions 3 and 4 solved my problem. Which one is better would be a matter of taste i guess.
    I expect 3 would be slightly faster.
    ___________________NZ31200_________NZ3120A___________ 1 ^N.*?[^0][^0]$ NO match NO match 2 ^N.*?00$ match NO match 3 ^N.*(?<!00)$ NO match match 4 ^N(?!.*00$).*$ NO match match 5 N.* and !/00$ won't do, as it must be ONE RegEx ____________________________________________________
    -- allan
    ===========================================================
    As the eternal tranquility of Truth reveals itself to us, this very place is the Land of Lotuses
    -- Hakuin Ekaku Zenji
      By the way there's also
      /^N.*(?!00)..$/
      and
      /^N(?!.*00$)/

      OK, thanks to ikegami's help, I've fixed the benchmarking code. Here are the results:

      # The suffix +/- correspond to "matching" and "non-matching" input, # respectively. # input length: 7 Rate ?<!- ?!+ ?<!+ ?!- ?<!- 583158/s -- -38% -54% -55% ?!+ 945231/s 62% -- -25% -27% ?<!+ 1262619/s 117% 34% -- -2% ?!- 1286220/s 121% 36% 2% -- # input length: 10 Rate ?<!- ?!+ ?<!+ ?!- ?<!- 220554/s -- -49% -63% -64% ?!+ 430349/s 95% -- -28% -31% ?<!+ 601510/s 173% 40% -- -3% ?!- 619376/s 181% 44% 3% -- # input length: 10000 Rate ?<!- ?!+ ?!- ?<!+ ?<!- 342/s -- -91% -99% -99% ?!+ 3864/s 1029% -- -86% -86% ?!- 26795/s 7730% 593% -- -6% ?<!+ 28417/s 8204% 635% 6% --
      The differences between first and second place are not significant (different runs of the same benchmarking script would yield different relative ordering). The differences between 2nd and 3rd place, and between 3rd and 4th place are significant, though. The results are not conclusive. Negative look-ahead clearly beats positive look-ahead when the input matches, but the opposite is true when the input doesn't match.

      the lowliest monk

      Here are some benchmarks: (fixed from TLM's)
      use strict; use warnings; use Benchmark 'cmpthese'; my $sz = ( shift || 10 ) - 4; my $X = 'N' . 'x' x $sz . '000'; my $Y = 'N' . 'x' x $sz . '001'; print("X = $X\n"); print("Y = $Y\n"); print("\n"); cmpthese( -1, { '/^N(?!.*00$).*$/ failure' => sub { scalar $X =~ /^N(?!.*00$).*$/ }, '/^N(?!.*00$).*$/ success' => sub { scalar $Y =~ /^N(?!.*00$).*$/ }, '/^N(?!.*00$)/ failure' => sub { scalar $X =~ /^N(?!.*00$)/ }, '/^N(?!.*00$)/ success' => sub { scalar $Y =~ /^N(?!.*00$)/ }, '/^N.*(?<!00$)$/ failure' => sub { scalar $X =~ /^N.*(?<!00$)$/ }, '/^N.*(?<!00$)$/ success' => sub { scalar $Y =~ /^N.*(?<!00$)$/ }, '/^N.*(?!00)..$/ failure' => sub { scalar $X =~ /^N.*(?!00)..$/ }, '/^N.*(?!00)..$/ success' => sub { scalar $Y =~ /^N.*(?!00)..$/ }, });

      Results:

      X = Nxxxxxx000 Y = Nxxxxxx001 /^N.*(?!00)..$/ failure 278229/s -- /^N.*(?<!00$)$/ failure 330512/s 19% /^N(?!.*00$).*$/ success 536004/s 93% /^N.*(?!00)..$/ success 540980/s 94% /^N(?!.*00$)/ success 633900/s 128% /^N(?!.*00$)/ failure 743994/s 167% /^N(?!.*00$).*$/ failure 750772/s 170% /^N.*(?<!00$)$/ success 789866/s 184%

      Combined success and failure results:
      (If the numbers are evenly distributed, the chance of failure is 0.01.)

      /^N(?!.*00$).*$/ 750772/s * 0.01 + 536004/s * 0.99 = 538151.68/s -- /^N.*(?!00)..$/ 278229/s * 0.01 + 540980/s * 0.99 = 538352.49/s 0% /^N(?!.*00$)/ 743994/s * 0.01 + 633900/s * 0.99 = 635000.94/s 18% /^N.*(?<!00$)$/ 330512/s * 0.01 + 789866/s * 0.99 = 785272.46/s 46%

      /^N.*(?<!00$)$/ is the winner as expected, and it won by no small margin (24%).

Re: ^N.*(?!00$)
by thundergnat (Deacon) on Jun 06, 2005 at 14:06 UTC

    Hmm. The OPs spec was to match a WORD that starts with N and doesn't end with 00. All of the solutions I've seen posted so far work on a PHRASE (to be fair, including a phrase that consists of one word...) Fine if there is only going to be a single word to match against, but to be more general, use something like:

    /\bN(?!\w*00\b)\w*\b/

    as in:

    my $string = q/N553342 N455673 N55788 N44200 NZ31200 NZ3120A/; my @matches = $string =~ /\bN(?!\w*00\b)\w*\b/g; print join "\n", @matches;
      Right, if the word was embedded in a phrase, your regex would do the job. As it is (and you couldn't know) I extract the words from a CSV file and have them in a $var which i match against a regex entered from the UI, so I don't really need "phrase generality" in this specific case.
      Good pt. of view, anyway :)
      -- allan
Re: ^N.*(?!00$)
by Fang (Pilgrim) on Jun 06, 2005 at 11:55 UTC

    I would personnally use the !~ operator

    % perl -le'print "Good" if "NZ312000" !~ /^N.*?00$/' % perl -le'print "Good" if "NZ312005" !~ /^N.*?00$/' Good % perl -le'print "Good" if "NZ312050" !~ /^N.*?00$/' Good

    Update: all of the above is indeed flawed, and more elegant solutions than splitting the regex in two (/^N/ and /00$/) have already been pointed out.

      $ perl -le 'print "Oops" if "POODLESNOODLESSTRUDELS" !~ /^N.*?00$/' Oops

      --
      We're looking for people in ATL