Re: parsing question
by Abigail-II (Bishop) on Sep 12, 2003 at 11:15 UTC
|
/_test(?>\s+)(?!<)/
Abigail | [reply] [d/l] |
|
|
ladoix% cat 290992
#!/usr/bin/perl
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/_test(?>\s+)(?!<)/)->explain;
ladoix% perl 290992
The regular expression:
(?-imsx:_test(?>\s+)(?!<))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
_test '_test'
----------------------------------------------------------------------
(?> match (and do not backtrack afterwards):
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
--
Allolex
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
|
|
|
|
|
|
Out of interest with the experimental ?>, I did a benchmark with the following little test:
use Benchmark;
$str1 = "_test (folloed by 1 or more spaces)";
$str2 = "_test < xxx >";
timethese ( 1000000,
{ 'p1' => '&p1;',
'p2' => '&p2;',
'p3' => '&p3;',
'p4' => '&p4;',
} );
sub p1 ()
{
$str1 =~ /_test(?>\s+)(?!<)/;
}
sub p2 ()
{
$str1 =~ /_test(?:\s+)(?!<)/;
}
sub p3 ()
{
$str2 =~ /_test(?>\s+)(?!<)/;
}
sub p4 ()
{
$str2 =~ /_test(?:\s+)(?!<)/;
}
I got the following results:
Benchmark: timing 1000000 iterations of p1, p2, p3, p4...
p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU)
@ 333333.33/s (n=1000000)
p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU)
@ 358422.94/s (n=1000000)
p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU)
@ 323624.60/s (n=1000000)
p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU)
@ 354609.93/s (n=1000000)
It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice? | [reply] [d/l] [select] |
|
|
Considering that /_test(?:\s+)(?!<)/ is wrong,
as demonstrated elsewhere in this thread, I fail to see your
point.
Abigail
| [reply] [d/l] |
|
|
|
|
|
|
|
Out of interest with the experimental ?>, I did a benchmark with the following little test:
use Benchmark;
$str1 = "_test (folloed by 1 or more spaces)";
$str2 = "_test < xxx >";
timethese ( 1000000,
{ 'p1' => '&p1;',
'p2' => '&p2;',
'p3' => '&p3;',
'p4' => '&p4;',
} );
sub p1 ()
{
$str1 =~ /_test(?>\s+)(?!<)/;
}
sub p2 ()
{
$str1 =~ /_test(?:\s+)(?!<)/;
}
sub p3 ()
{
$str2 =~ /_test(?>\s+)(?!<)/;
}
sub p4 ()
{
$str2 =~ /_test(?:\s+)(?!<)/;
}
I got the following results:
Benchmark: timing 1000000 iterations of p1, p2, p3, p4...
p1: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU)
@ 333333.33/s (n=1000000)
p2: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2.79 CPU)
@ 358422.94/s (n=1000000)
p3: 3 wallclock secs ( 3.09 usr + 0.00 sys = 3.09 CPU)
@ 323624.60/s (n=1000000)
p4: 3 wallclock secs ( 2.82 usr + 0.00 sys = 2.82 CPU)
@ 354609.93/s (n=1000000)
It seems that the ?> runs slower than ?: matching by as much as 10 percent. So am I correct to say that optimization wise, the ?> might not be the first choice? | [reply] [d/l] [select] |
|
|
Switching timethese with cmpthese , here's the math
Win32 ActivePerl 5.6.1 (Build 633)
Rate p4 p3 p1 p2
p4 865052/s -- -1% -1% -4%
p3 876424/s 1% -- -0% -3%
p1 877193/s 1% 0% -- -3%
p2 901713/s 4% 3% 3% --
Win32 ActivePerl 5.8.0 (build 804)
Rate p3 p1 p2 p4
p3 831255/s -- -3% -5% -8%
p1 853971/s 3% -- -3% -5%
p2 876424/s 5% 3% -- -3%
p4 900901/s 8% 5% 3% --
p1 and p3 use the "cut" operator.
The optimization depends on your perl version. | [reply] |
Re: parsing question
by flounder99 (Friar) on Sep 12, 2003 at 13:20 UTC
|
/_test\s+(?!\s|<)/;
also works but without using any so called "experimental" regex extended patterns.
-- flounder
| [reply] [d/l] |
Re: parsing question
by hmerrill (Friar) on Sep 12, 2003 at 12:32 UTC
|
Seems to me you need another regex to test if you've found a line containing the angle brackets:
if ($line =~ /_test(\s+)</) {
### skip this one ###
}
elif ($line =~ /_test(\s+)/) {
print "found\n";
}
HTH. | [reply] [d/l] |
|
|
I was trying to avoid an if else statement.. Abeigills solution worked a treat...tHanx a million
| [reply] |