Re: Peeling the Peelings
by l2kashe (Deacon) on Jul 01, 2003 at 19:32 UTC
|
sub get_inner {
my $str = shift;
my $inner = ( split(/\(/, $str) )[-1];
$inner =~ s/\).+$//;
$inner;
}
Im sure it could be golfed, but *shrug* HTH
Update: or the reverse would be
sub get_inner {
my $str = shift;
my $inner = ( split(/\)/, $str) )[0];
$inner =~ m/([^\(]+)$/;
$1;
}
MMMMM... Chocolaty Perl Goodness..... | [reply] [d/l] [select] |
Re: Peeling the Peelings
by Zaxo (Archbishop) on Jul 01, 2003 at 19:12 UTC
|
| [reply] |
|
|
| [reply] |
Re: Peeling the Peelings
by gjb (Vicar) on Jul 01, 2003 at 19:26 UTC
|
The following should do what you want if the input is in the format you specify, i.e. nested function with single arguments.
my $example1 = 'this(is(a(test)))';
foreach (0..4) {
print "level $_: '", peel($example1, $_), "'\n";
}
print "level -1: '", peel($example1, -1), "'\n";
sub peel {
my ($str, $level) = @_;
if ($level < 0) {
$str =~ /([^\(\)]+)\)+\Z/;
return $1;
}
my $opening = '(?:[^\(]+\(){' . $level . '}';
my $closing = '\){' . $level . '}';
$str =~ /\A$opening(.+?)$closing\Z/;
return $1;
}
Hope this helps, -gjb- | [reply] [d/l] |
|
|
Rate Peel Get_Proparg
Peel 1359/s -- -76%
Get_Proparg 5780/s 325% --
Looks like thats a wee bit slower.
-Waswas | [reply] [d/l] |
|
|
my %cache = ('-1' => qr/([^\(\)]+)\)+\Z/);
sub peel {
my $str = shift;
my $level = shift || 0;
if (!exists $cache{$level}) {
my $opening = '[^\(]+\(' x $level;
my $closing = '\)' x $level;
$cache{$level} = qr/\A$opening(.+?)$closing\Z/;
}
$str =~ $cache{$level};
return $1;
}
The benchmark results are:
Rate Fatvamp gjb bobn Aristotle
Fatvamp 21155/s -- -14% -17% -28%
gjb 24631/s 16% -- -3% -16%
bobn 25523/s 21% 4% -- -13%
Aristotle 29497/s 39% 20% 16% --
Best regards, -gjb-
| [reply] [d/l] [select] |
Re: Peeling the Peelings
by PetaMem (Priest) on Jul 02, 2003 at 05:11 UTC
|
Thanks for all your comments.
After reading the posts - especially the benchmarks,
it seems that the code is not as bad as I thought. However
just by looking at it once again and looking at the
"80% speedup" for the level 0 case, I came with 2 improvements:
sub get_proparg {
my $propstr = shift; # get the property string
my $level = shift || return $propstr; # get the level we want to
+extract
my $cnt; # initialize counter
if($level == -1) { # special case, get the innermost argum
+ent
$propstr =~ /\(([^\(\)]+)\)+/;
return $1;
} else { # get whatever argument $level indicate
+s
for($cnt = 0;$cnt<$level; $cnt++) {
$propstr =~ /\((.+)\)/;
$propstr = $1;
}
return $propstr;
}
}
- No need for $back - it wasn't used anyway ;-)
- Immediate return if level = 0; That's straight :-)
- No unnecessary temporary $var = $var assignments
Should perform slightly better now, esp. for the level 0
case (which unfortunatedly is rare).
Bye
PetaMem All Perl: MT, NLP, NLU
| [reply] [d/l] |
|
|
Sorry, dude, I still 0wn y0u! ;-)
Results:
Benchmark: timing 25000 iterations of mine-elegant, mine-less-elegant,
+ original, original++...
mine-elegant: 8 wallclock secs ( 8.20 usr + 0.00 sys = 8.20 CPU) @
+3048.78/s (n=25000)
mine-less-elegant: 6 wallclock secs ( 5.64 usr + 0.01 sys = 5.65 CP
+U) @ 4424.78/s (n=25000)
original: 9 wallclock secs ( 9.29 usr + 0.00 sys = 9.29 CPU) @ 26
+91.07/s (n=25000)
original++: 9 wallclock secs ( 9.33 usr + 0.00 sys = 9.33 CPU) @ 26
+79.53/s (n=25000)
Benchmark: timing 25000 iterations of mine-elegant, mine-less-elegant,
+ original, original++...
mine-elegant: 9 wallclock secs ( 8.67 usr + 0.00 sys = 8.67 CPU) @
+2883.51/s (n=25000)
mine-less-elegant: 8 wallclock secs ( 6.11 usr + 0.05 sys = 6.16 CP
+U) @ 4058.44/s (n=25000)
original: 9 wallclock secs ( 8.67 usr + 0.01 sys = 8.68 CPU) @ 28
+80.18/s (n=25000)
original++: 9 wallclock secs ( 8.72 usr + 0.01 sys = 8.73 CPU) @ 28
+63.69/s (n=25000)
Benchmark: timing 25000 iterations of mine-elegant, mine-less-elegant,
+ original, original++...
mine-elegant: 9 wallclock secs ( 8.40 usr + 0.00 sys = 8.40 CPU) @
+2976.19/s (n=25000)
mine-less-elegant: 6 wallclock secs ( 5.22 usr + 0.02 sys = 5.24 CP
+U) @ 4770.99/s (n=25000)
original: 9 wallclock secs ( 8.33 usr + 0.00 sys = 8.33 CPU) @ 30
+01.20/s (n=25000)
original++: 8 wallclock secs ( 7.88 usr + 0.02 sys = 7.90 CPU) @ 31
+64.56/s (n=25000)
--Bob Niederman, http://bob-n.com | [reply] [d/l] [select] |
|
|
sub get_proparg_new {
my $propstr = shift; # get the property string
my $level = shift || return $propstr; # get the level we want to e
+xtract
if($level == -1) { # special case, get the innermost argume
+nt
$propstr =~ /\(([^()]+)\)+/;
return $1;
} else { # get whatever argument $level indicates
while ($level--) {
$propstr =~ /\((.+)\)/;
$propstr = $1;
}
}
return $propstr;
}
Benchmark: timing 25000 iterations of mine-elegant, mine-less-elegant, original, original++...
mine-elegant: 5 wallclock secs ( 4.61 usr + 0.00 sys = 4.61 CPU) @ 5422.99/s (n=25000)
mine-less-elegant: 2 wallclock secs ( 2.16 usr + 0.00 sys = 2.16 CPU) @ 11574.07/s (n=25000)
original: 2 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU) @ 9803.92/s (n=25000)
original++: 3 wallclock secs ( 2.32 usr + 0.00 sys = 2.32 CPU) @ 10775.86/s (n=25000)
I guess I'll trade that performance loss against readability. But thanks for the Ideas I could borrow and reuse.
Bye
PetaMem All Perl: MT, NLP, NLU
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|
|
|
|
Aristotle => sub {
my $str = shift;
my $level = shift || return $str;
$str =~ /.*\((.*?)\)/ and return $1 if $level == -1;
local $_ = substr $str, 0, -$level;
/\G.*?\(/gc while $level--;
return substr $_, pos;
},
ok 1 - by Aristotle for case -1
ok 2 - by Aristotle for case 0
ok 3 - by Aristotle for case 1
ok 4 - by Aristotle for case 2
ok 5 - by Aristotle for case 3
ok 6 - by Aristotle for case 4
ok 7 - by Aristotle for case 5
ok 8 - by bobn for case -1
ok 9 - by bobn for case 0
ok 10 - by bobn for case 1
ok 11 - by bobn for case 2
ok 12 - by bobn for case 3
ok 13 - by bobn for case 4
ok 14 - by bobn for case 5
ok 15 - by Fatvamp for case -1
ok 16 - by Fatvamp for case 0
ok 17 - by Fatvamp for case 1
ok 18 - by Fatvamp for case 2
ok 19 - by Fatvamp for case 3
ok 20 - by Fatvamp for case 4
ok 21 - by Fatvamp for case 5
Benchmark: running Aristotle, Fatvamp, bobn for at least 2 CPU seconds
+...
Aristotle: 2 wallclock secs ( 2.12 usr + 0.01 sys = 2.13 CPU) @ 25
+239.44/s (n=53760)
Fatvamp: 2 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 15
+751.90/s (n=33079)
bobn: 2 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ 22
+247.80/s (n=45608)
Rate Fatvamp bobn Aristotle
Fatvamp 15752/s -- -29% -38%
bobn 22248/s 41% -- -12%
Aristotle 25239/s 60% 13% --
1..21
Update
Here's another take with cached pregenerated regexes to remove the loop. It performs exactly the same as my simpler version, to the point that they sometimes achieve the exact same iterations/sec.
Aristotle2 => sub {
my $str = shift;
my $level = shift || return $str;
$str =~ /.*\((.*?)\)/ and return $1 if $level == -1;
local $_ = substr $str, 0, -$level;
my $rx = $rx_hash{$level} ||= do {
my $rx = join '', '\A', '(?>.*?\()' x $level;
qr/$rx/;
};
m/$rx/g && return substr $_, pos;
},
It needs my %rx_hash; at the top of the script somewhere of course.
Makeshifts last the longest. | [reply] [d/l] [select] |
|
|
Re: Peeling the Peelings
by BrowserUk (Patriarch) on Jul 01, 2003 at 21:04 UTC
|
This avoids loops (external to the regex at least). It might compare favorably if you remove the balance check code.
sub peeln {
local $_ = shift;
my $n = shift;
my($o, $c) = (tr[(][], tr[)][]);
warn 'Unbalanced parens' unless $o == $c;
$n = $c if $n == -1 or $n > $c;
m[^ (?: .*? \( .*?){$n} (.+) (?: .*? \) .*?){$n} $]x;
$1;
}
my @tests = (
'hello(what(is(this(all(about)))))',
'hello(what(is(this(all(about))))))',
'hello((what(is(this(all(about)))))',
'A(B(c) D(e) f(g(h(i))))',
);
for my $test ( @tests ) {
print "$test : $_ : ", peeln( $test, $_ ) for -1 .. 5;
}
__END__
P:\>270595
hello(what(is(this(all(about))))) : -1 : about
hello(what(is(this(all(about))))) : 0 : hello(what(is(this(all(about))
+)))
hello(what(is(this(all(about))))) : 1 : what(is(this(all(about))))
hello(what(is(this(all(about))))) : 2 : is(this(all(about)))
hello(what(is(this(all(about))))) : 3 : this(all(about))
hello(what(is(this(all(about))))) : 4 : all(about)
hello(what(is(this(all(about))))) : 5 : about
Unbalanced parens at P:\270595.pl8 line 28.
Use of uninitialized value in print at P:\270595.pl8 line 42.
hello(what(is(this(all(about)))))) : -1 :
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 0 : hello(what(is(this(all(about)
+)))))
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 1 : what(is(this(all(about)))))
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 2 : is(this(all(about))))
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 3 : this(all(about)))
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 4 : all(about))
Unbalanced parens at P:\270595.pl8 line 28.
hello(what(is(this(all(about)))))) : 5 : about)
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : -1 : all(about
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 0 : hello((what(is(this(all(about
+)))))
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 1 : (what(is(this(all(about))))
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 2 : what(is(this(all(about)))
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 3 : is(this(all(about))
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 4 : this(all(about)
Unbalanced parens at P:\270595.pl8 line 28.
hello((what(is(this(all(about))))) : 5 : all(about
Use of uninitialized value in print at P:\270595.pl8 line 42.
A(B(c) D(e) f(g(h(i)))) : -1 :
A(B(c) D(e) f(g(h(i)))) : 0 : A(B(c) D(e) f(g(h(i))))
A(B(c) D(e) f(g(h(i)))) : 1 : B(c) D(e) f(g(h(i)))
A(B(c) D(e) f(g(h(i)))) : 2 : c) D(e) f(g(h(i))
A(B(c) D(e) f(g(h(i)))) : 3 : e) f(g(h(i)
A(B(c) D(e) f(g(h(i)))) : 4 : g(h(i
Use of uninitialized value in print at P:\270595.pl8 line 42.
A(B(c) D(e) f(g(h(i)))) : 5 :
It doesn't handle (?term?) nesting like this a(b(c) d(e)), but it was unclear to me what the return would be in these circumstances?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [d/l] [select] |
|
|
Actually, your cosde is a lot slower, once again, go figure:
Results:
Benchmark: timing 15000 iterations of his, mine, mine-less-lgnt, other
+...
his: 5 wallclock secs ( 4.94 usr + 0.00 sys = 4.94 CPU) @ 30
+36.44/s (n=15000)
mine: 5 wallclock secs ( 4.94 usr + 0.01 sys = 4.95 CPU) @ 30
+30.30/s (n=15000)
mine-less-lgnt: 3 wallclock secs ( 3.43 usr + 0.00 sys = 3.43 CPU)
+@ 4373.18/s (n=15000)
other: 22 wallclock secs (20.89 usr + 0.03 sys = 20.92 CPU) @ 71
+7.02/s (n=15000)
--Bob Niederman, http://bob-n.com | [reply] [d/l] [select] |
|
|
UpdateThis version is competely wrong!! It is quick because it does nothing at all. I benchmarked this, but verified the output of a completely different, correct, but much slower piece of code.
A subtle variation on my last version acheives a worthwhile speedup. It's about 50% quicker than my previous best and over twice as fast as the original.
I wish there was a way to put a big red cross through the code as well.
## !!! DO NOT USE !!! TOTALLY BOGUS CODE. !!!
sub peel2 {
my( $s, $n ) = @_;
my($start, $stop, $p, $q) = (length $s, 0, 0);
($start, $stop) = ($p, $q)
while $p = 1+index( $s, ')', $stop ) < 0
and $q = rindex( $s, '(', $start ) < 0
and $n--;
substr $s, $start, $stop - $start;
}
Rate his mine-less-lgnt buk2
+ buk3
his 569/s -- -35% -45%
+ -62%
mine-less-lgnt 875/s 54% -- -15%
+ -42%
buk2 1026/s 80% 17% --
+ -32%
buk3 1502/s 164% 72% 46%
+ --
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [d/l] |
|
|
As my original attempt was so crap, I thought I'd take another crack. Abandoning any attempt at validation and avoiding the regex engine completely, I came up with this which seems to be about 15% better than the best so far.
sub peel {
my( $s, $n ) = @_;
my( $start, $stop ) = ( 0, length $s );
($start,$stop) = (
1 + index( $s, '(', $start ),
rindex( $s, ')', $stop -1 )
# reformatting_for_posting error corrected
# Was ) while $n-- > 0 and index( $s, '(', $start +1 ) > 0;
) while $n-- and index( $s, '(', $start +1 ) > 0;
substr $s, $start, $stop - $start;
}
Rate mine-less-lgnt buk2
mine-less-lgnt 918/s -- -15%
buk2 1075/s 17% --
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
| [reply] [d/l] |
|
|
|
|
|
|
|
| [reply] |
Re: Peeling the Peelings
by waswas-fng (Curate) on Jul 01, 2003 at 20:33 UTC
|
How many loops is a normal run doing on your data? is it tightly lopped? you could see what qring the regex does to the time -- on some tight regex loops I have noticed 40 - 50% speed ups using a precompiled regex.
-Waswas
Ignore this i timed it and qr the regex is slower in this case (must be the () in there...). | [reply] [d/l] |
Re: Peeling the Peelings
by bobn (Chaplain) on Jul 01, 2003 at 21:43 UTC
|
sub prop3
{
my $str = shift;
my $level = shift || 0;
return $str unless $level; # 3rd update
my @str = split(/[()]/, $str);
splice @str, 0, $level;
my $out = join('(', @str);
$out .= ')' x ( @str - 1 );
return $out;
}
But it's actually about 10% slower than the original. Go figure.
update: more testing - it really is slower. Very aggravating to me, since it looks like good clean code to me.
update again:
Also came up with this, which is somewhat quicker, but uglier:
sub getpropstr
{
local $_ = shift;
my $level = shift || 0;
if ( $level == -1 ) { /([^()]+)\)+$/; return $1 }
while ( $level-- > 0)
{
chop;
s/^[^(]+\(//;
}
return $_;
}
3rd update: line commented w/ '3rd update' in my first routine makes it closer to the same speed as the OP's original.
--Bob Niederman, http://bob-n.com | [reply] [d/l] [select] |
Re: Peeling the Peelings
by yosefm (Friar) on Jul 01, 2003 at 19:11 UTC
|
/([^\(]*\(){$level}(.*)\)?/ should get you the level you want ($level is used as a quantifier, inside curlies). | [reply] [d/l] |
|
|
There's a little quirk with this:
"hello(what(is(this(all(about)))))" 4
get_proparg returned: "all(about)"
Fastpath: "all(about)))))"
"Fastpath" is your solution. It also doesn't behave very well if I $level is a higher number than there already are nested levels - unfortunatedly you never know before how many there are.
Bye
PetaMem All Perl: MT, NLP, NLU
| [reply] |
Re: Peeling the Peelings
by traveler (Parson) on Jul 01, 2003 at 23:10 UTC
|
This one appears faster for the n > -1 case. Compared to get_proparg it is only about 7% worse for the -1 case and is about 80% faster for the 0 case:
sub arg2 {
my($str,$lvl)= @_;
my $str1 = $str;
return $str if !$lvl;
# skip up to and including nth ( paren, then strip n ) parens from
+ end
$str =~ /([^(]*\(){$lvl}(.+)(\)){$lvl}/;
return $2 unless $lvl == -1;
$str =~ /\(([^\(\)]+)\)+/;
return $1;
}
The timing results where lvl=2 are:
Rate get_proparg arg2
get_proparg 137195/s -- -31%
arg2 199115/s 45% --
HTH, --traveler | [reply] [d/l] [select] |
|
|
But for the original sample of data, it is hideous.
Results:
Benchmark: timing 15000 iterations of mine-elegant, mine-less-elegant,
+ original, yours-arg2...
mine-elegant: 7 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @
+2890.17/s (n=15000)
mine-less-elegant: 4 wallclock secs ( 3.19 usr + 0.01 sys = 3.20 CP
+U) @ 4687.50/s (n=15000)
original: 6 wallclock secs ( 5.06 usr + 0.02 sys = 5.08 CPU) @ 29
+52.76/s (n=15000)
yours-arg2: 32 wallclock secs (25.06 usr + 0.08 sys = 25.14 CPU) @ 59
+6.66/s (n=15000)
The lesson I'm taking away from this: simple regexes can be very fast, but time increasses rapidly with regex complexity.
--Bob Niederman, http://bob-n.com | [reply] [d/l] [select] |
|
|
They don't need to. You just have to tell the regex engine exactly what you want. The regex he used has no anchors - why? There's also a bunch of useless capturing parens, and in fact one paren pair that's completely superfluous. All of that is not what we wanted. I've got no Perl here, so I'll have to test this later, but I'm pretty positive that the following works as specified, and very certain that it'll perform tons better.
/\A
(?> (?> [^(]* \( ) {$lvl} )
( .+ )
\) {$lvl}
\z/x;
Makeshifts last the longest. | [reply] [d/l] |
|
|
|
|
You know, someone ought to write a book about that (time increasing with regex complexity). :-)
| [reply] |
|
|
Phooey. The lessons here are: if it looks good and seems to test well, it still might not be good -- test more; somewhat counterintuitively, {} match counts are slower than loops; some optimizations (e.g. those from Aristotle's post) are not as much more effecient as they seem; and as tilly and bobn point out a correlary of the second lesson is that processing a string through multiple single REs is often better than one complex one.
--traveler
| [reply] |
Re: Peeling the Peelings
by bobn (Chaplain) on Jul 02, 2003 at 16:14 UTC
|
update2: evdb came up w./ a test harness that works around the problem noted in update1
update1: the 'evdb' destroys the data, so the benchmarks as run are not valid.
<strikethrough>All my previous posts about speed are suspect. My current test numbers have suddenly become inconsistent with my previous numbers, even for relative assessments. I don't get it, but right now it looks like the OP's best is better than my best by a little (and of course the 'evdb' solution leaves us all in the dust).</strikethrough>
--Bob Niederman, http://bob-n.com | [reply] |