Re: parsing question

Replies are listed 'Best First'.
Re: Re: parsing question by allolex (Curate) on Sep 12, 2003 at 12:27 UTC
Abigail's extended regular expression is also an opportunity to show you a nifty module called YAPE::Regex::Explain. ladoix% cat 290992 #!/usr/bin/perl use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr/_test(?>\s+)(?!<)/)->explain; ladoix% perl 290992 The regular expression: (?-imsx:_test(?>\s+)(?!<)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- _test '_test' ---------------------------------------------------------------------- (?> match (and do not backtrack afterwards): ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- < '<' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] -- Allolex	[reply] [d/l]
Re: parsing question by Abigail-II (Bishop) on Sep 12, 2003 at 12:50 UTC
Unfortunally, it only explains what it does, but it doesn't explain why it does so. Perhaps the most subtle part of the regex is `(?>\s+)`. Can you explain why it uses "no backtracking"? ;-) Abigail	[reply] [d/l]
Re: Re: parsing question by allolex (Curate) on Sep 12, 2003 at 13:06 UTC
I would love to know why you did what you did with that regex. There are probably a lot of monks in the Monastery who could learn something from you and just need a little push in the right direction. :) -- Allolex	[reply]
Re: parsing question by Abigail-II (Bishop) on Sep 12, 2003 at 13:13 UTC
2Re: parsing question by bart (Canon) on Sep 13, 2003 at 10:31 UTC
I'm sure you're familiar with the "cut" operator, Abigail, but most of the other people here will find it largely underdocumented. So I dare to feel free to point towards the draft of a book on regular expressions that the same perlmonk is writing, who also wrote YAPE::Regex::Explain (actually, he wrote the whole YAPE suite). I personally like the draft of book very much, and so, I like to plug it whenever I can. So here's the URL: http://japhy.perlmonk.org/book/. Check out chapter 8 for the "cut" operator — every chapter is a separate download, roughly around 12 pages each, either in MS Word or in PDF format. Recommended.	[reply]
Re: Re: parsing question by Roger (Parson) on Sep 15, 2003 at 02:42 UTC
Out of interest with the experimental ?>, I did a benchmark with the following little test: `use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }` [download] I got the following results: `Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)` [download] It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?	[reply] [d/l] [select]
Re: parsing question by Abigail-II (Bishop) on Sep 15, 2003 at 07:01 UTC
Considering that `/_test(?:\s+)(?!<)/` is wrong, as demonstrated elsewhere in this thread, I fail to see your point. Abigail	[reply] [d/l]
Re: Re: parsing question by Roger (Parson) on Sep 15, 2003 at 09:07 UTC
Hi Ab, but I have tested the following code, which were giving me the same test results on many test cases: `if ($str =~ /_test(?>\s+)(?!<)/) { print "test ok\n" } if ($str =~ /_test(?:\s+)(?!<)/) { print "test ok\n" }` [download] Could you please tell me as why the `/_test(?:\s+)(?!<)/` is wrong? I want to learn. Thanks!	[reply] [d/l] [select]
Re: parsing question by Abigail-II (Bishop) on Sep 15, 2003 at 11:36 UTC
Re: Re: parsing question by Roger (Parson) on Sep 16, 2003 at 01:14 UTC
Re: Re: parsing question by Roger (Parson) on Sep 15, 2003 at 02:42 UTC
Out of interest with the experimental ?>, I did a benchmark with the following little test: `use Benchmark; $str1 = "_test (folloed by 1 or more spaces)"; $str2 = "_test < xxx >"; timethese ( 1000000, { 'p1' => '&p1;', 'p2' => '&p2;', 'p3' => '&p3;', 'p4' => '&p4;', } ); sub p1 () { $str1 =~ /_test(?>\s+)(?!<)/; } sub p2 () { $str1 =~ /_test(?:\s+)(?!<)/; } sub p3 () { $str2 =~ /_test(?>\s+)(?!<)/; } sub p4 () { $str2 =~ /_test(?:\s+)(?!<)/; }` [download] I got the following results: `Benchmark: timing 1000000 iterations of p1, p2, p3, p4... p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 333333.33/s (n=1000000) p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU) @ 358422.94/s (n=1000000) p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU) @ 323624.60/s (n=1000000) p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU) @ 354609.93/s (n=1000000)` [download] It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice?	[reply] [d/l] [select]
Re: Re: Re: parsing question by Anonymous Monk on Sep 15, 2003 at 03:06 UTC
Switching timethese with cmpthese , here's the math Win32 ActivePerl 5.6.1 (Build 633) Rate p4 p3 p1 p2 p4 865052/s -- -1% -1% -4% p3 876424/s 1% -- -0% -3% p1 877193/s 1% 0% -- -3% p2 901713/s 4% 3% 3% -- Win32 ActivePerl 5.8.0 (build 804) Rate p3 p1 p2 p4 p3 831255/s -- -3% -5% -8% p1 853971/s 3% -- -3% -5% p2 876424/s 5% 3% -- -3% p4 900901/s 8% 5% 3% -- p1 and p3 use the "cut" operator. The optimization depends on your perl version.	[reply]