in reply to parsing question

/_test(?>\s+)(?!<)/

Abigail

Replies are listed 'Best First'.
Re: Re: parsing question
by allolex (Curate) on Sep 12, 2003 at 12:27 UTC

    Abigail's extended regular expression is also an opportunity to show you a nifty module called YAPE::Regex::Explain.

    ladoix% cat 290992 #!/usr/bin/perl use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/_test(?>\s+)(?!<)/)->explain; ladoix% perl 290992 The regular expression: (?-imsx:_test(?>\s+)(?!<)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- _test '_test' ---------------------------------------------------------------------- (?> match (and do not backtrack afterwards): ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- < '<' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    --
    Allolex

      Unfortunally, it only explains what it does, but it doesn't explain why it does so. Perhaps the most subtle part of the regex is (?>\s+).

      Can you explain why it uses "no backtracking"? ;-)

      Abigail

        I would love to know why you did what you did with that regex. There are probably a lot of monks in the Monastery who could learn something from you and just need a little push in the right direction. :)

        --
        Allolex

        I'm sure you're familiar with the "cut" operator, Abigail, but most of the other people here will find it largely underdocumented. So I dare to feel free to point towards the draft of a book on regular expressions that the same perlmonk is writing, who also wrote YAPE::Regex::Explain (actually, he wrote the whole YAPE suite).

        I personally like the draft of book very much, and so, I like to plug it whenever I can. So here's the URL: http://japhy.perlmonk.org/book/. Check out chapter 8 for the "cut" operator — every chapter is a separate download, roughly around 12 pages each, either in MS Word or in PDF format. Recommended.

Re: Re: parsing question
by Roger (Parson) on Sep 15, 2003 at 02:42 UTC
    Out of interest with the experimental ?>, I did a benchmark with the following little test:
    use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }
    I got the following results:
    Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)
    It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?
      Considering that /_test(?:\s+)(?!<)/ is wrong, as demonstrated elsewhere in this thread, I fail to see your point.

      Abigail

        Hi Ab, but I have tested the following code, which were giving me the same test results on many test cases:
        if ($str =~ /_test(?>\s+)(?!<)/) { print "test ok\n" } if ($str =~ /_test(?:\s+)(?!<)/) { print "test ok\n" }
        Could you please tell me as why the /_test(?:\s+)(?!<)/ is wrong? I want to learn. Thanks!
Re: Re: parsing question
by Roger (Parson) on Sep 15, 2003 at 02:42 UTC
    Out of interest with the experimental ?>, I did a benchmark with the following little test:
    use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }
    I got the following results:
    Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)
    It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?
      Switching timethese with cmpthese , here's the math
      Win32  ActivePerl 5.6.1 (Build 633)
             Rate  p4  p3  p1  p2
      p4 865052/s  -- -1% -1% -4%
      p3 876424/s  1%  -- -0% -3%
      p1 877193/s  1%  0%  -- -3%
      p2 901713/s  4%  3%  3%  --
      
      Win32  ActivePerl 5.8.0 (build 804)
      
             Rate  p3  p1  p2  p4
      p3 831255/s  -- -3% -5% -8%
      p1 853971/s  3%  -- -3% -5%
      p2 876424/s  5%  3%  -- -3%
      p4 900901/s  8%  5%  3%  --
      
      p1 and p3 use the "cut" operator. The optimization depends on your perl version.