Washie101 has asked for the wisdom of the Perl Monks concerning the following question:

hi all

I have a reg ex problem thats probably simple to ye all

here goes

there are two strings

1st string = _test(followed by 1 or more spaces) 2nd string = _test <"value">
I need a regular expression something in the form of

if ($line =~ m/(_test)(\s+)/) { print "found.\n" }
This regular expression matches both cases above. But i want to tweak the regex have to ignore string 2. i.e if it finds the char < after (_test+)(\s+) to ignore that string and move on.

hope ye can help.

update (broquaint): tidied up formatting

Replies are listed 'Best First'.
Re: parsing question
by Abigail-II (Bishop) on Sep 12, 2003 at 11:15 UTC
    /_test(?>\s+)(?!<)/

    Abigail

      Abigail's extended regular expression is also an opportunity to show you a nifty module called YAPE::Regex::Explain.

      ladoix% cat 290992 #!/usr/bin/perl use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/_test(?>\s+)(?!<)/)->explain; ladoix% perl 290992 The regular expression: (?-imsx:_test(?>\s+)(?!<)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- _test '_test' ---------------------------------------------------------------------- (?> match (and do not backtrack afterwards): ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- < '<' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      --
      Allolex

        Unfortunally, it only explains what it does, but it doesn't explain why it does so. Perhaps the most subtle part of the regex is (?>\s+).

        Can you explain why it uses "no backtracking"? ;-)

        Abigail

      Out of interest with the experimental ?>, I did a benchmark with the following little test:
      use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }
      I got the following results:
      Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)
      It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?
        Considering that /_test(?:\s+)(?!<)/ is wrong, as demonstrated elsewhere in this thread, I fail to see your point.

        Abigail

      Out of interest with the experimental ?>, I did a benchmark with the following little test:
      use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }
      I got the following results:
      Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)
      It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?
        Switching timethese with cmpthese , here's the math
        Win32  ActivePerl 5.6.1 (Build 633)
               Rate  p4  p3  p1  p2
        p4 865052/s  -- -1% -1% -4%
        p3 876424/s  1%  -- -0% -3%
        p1 877193/s  1%  0%  -- -3%
        p2 901713/s  4%  3%  3%  --
        
        Win32  ActivePerl 5.8.0 (build 804)
        
               Rate  p3  p1  p2  p4
        p3 831255/s  -- -3% -5% -8%
        p1 853971/s  3%  -- -3% -5%
        p2 876424/s  5%  3%  -- -3%
        p4 900901/s  8%  5%  3%  --
        
        p1 and p3 use the "cut" operator. The optimization depends on your perl version.
Re: parsing question
by flounder99 (Friar) on Sep 12, 2003 at 13:20 UTC
    /_test\s+(?!\s|<)/;
    also works but without using any so called "experimental" regex extended patterns.

    --

    flounder

Re: parsing question
by hmerrill (Friar) on Sep 12, 2003 at 12:32 UTC
    Seems to me you need another regex to test if you've found a line containing the angle brackets:
    if ($line =~ /_test(\s+)</) { ### skip this one ### } elif ($line =~ /_test(\s+)/) { print "found\n"; }
    HTH.
      I was trying to avoid an if else statement.. Abeigills solution worked a treat...tHanx a million