in reply to Re: Peeling the Peelings
in thread Peeling the Peelings

But for the original sample of data, it is hideous.
#!/usr/bin/perl -w $|++; use strict; use Benchmark; sub arg2 { my($str,$lvl)= @_; my $str1 = $str; return $str if !$lvl; # skip up to and including nth ( paren, then strip n ) parens from +end $str =~ /([^(]*\(){$lvl}(.+)(\)){$lvl}/; return $2 unless $lvl == -1; $str =~ /\(([^\(\)]+)\)+/; return $1; } sub getpropstr { local $_ = shift; my $level = shift || 0; if ( $level == -1 ) { /([^()]+)\)+$/; return $1 } while ( $level-- > 0) { chop; s/^[^(]+\(//; } return $_; } my @data = <DATA>; chomp @data; if(0) { for (@data) { print "$_\n"; print getpropstr( split ), "\n" ; print get_proparg( split ), "\n" ; print prop3( split ), "\n" ; # print peeln( split ), "\n" ; print arg2( split ), "\n" ; } } my $count = 15000; timethese ( $count, { 'mine-elegant' => sub { for (@data) { prop3( split ); } }, 'mine-less-elegant' => sub { for (@data) { getpropstr( split +); } }, 'original' => sub { for (@data) { get_proparg( split ); } }, # 'other' => sub { for (@data) { peeln( split ); } }, 'yours-arg2' => sub { for (@data) { peeln( split ); } }, } ); sub peeln { local $_ = shift; my $n = shift || 0; my($c) = (tr[(][]); # warn 'Unbalanced parens' unless $o == $c; $n = $c if $n == -1; m[^ (?: .*? \( .*?){$n} (.+) (?: .*? \) .*?){$n} $]x; $1; } sub get_proparg { my $propstr = shift; # get the property string my $level = shift || 0; # get the level we want to extract my $back = $propstr; # Initialize Return value (for case leve +l=0) my $cnt; # initialize counter if($level == -1) { # special case, get the innermost argume +nt $propstr =~ /\(([^\(\)]+)\)+/; $propstr = $1; } else { # get whatever argument $level indicates for($cnt = 0;$cnt<$level; $cnt++) { $propstr =~ /\((.+)\)/; $propstr = $1; } } return $propstr; } sub prop3 { my $str = shift; my $level = shift || 0; return $str unless $level; my @str = split(/[()]/, $str); splice @str, 0, $level; my $out = join('(', @str); $out .= ')' x ( @str - 1 ); return $out; } __END__ hello(what(is(this(all(about))))) -1 hello(what(is(this(all(about))))) 0 hello(what(is(this(all(about))))) 1 hello(what(is(this(all(about))))) 2 hello(what(is(this(all(about))))) 3 hello(what(is(this(all(about))))) 4 hello(what(is(this(all(about))))) 5
Results:
Benchmark: timing 15000 iterations of mine-elegant, mine-less-elegant, + original, yours-arg2... mine-elegant: 7 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ +2890.17/s (n=15000) mine-less-elegant: 4 wallclock secs ( 3.19 usr + 0.01 sys = 3.20 CP +U) @ 4687.50/s (n=15000) original: 6 wallclock secs ( 5.06 usr + 0.02 sys = 5.08 CPU) @ 29 +52.76/s (n=15000) yours-arg2: 32 wallclock secs (25.06 usr + 0.08 sys = 25.14 CPU) @ 59 +6.66/s (n=15000)
The lesson I'm taking away from this: simple regexes can be very fast, but time increasses rapidly with regex complexity.

--Bob Niederman, http://bob-n.com

Replies are listed 'Best First'.
Re^3: Peeling the Peelings
by Aristotle (Chancellor) on Jul 02, 2003 at 08:05 UTC
    They don't need to. You just have to tell the regex engine exactly what you want. The regex he used has no anchors - why? There's also a bunch of useless capturing parens, and in fact one paren pair that's completely superfluous. All of that is not what we wanted. I've got no Perl here, so I'll have to test this later, but I'm pretty positive that the following works as specified, and very certain that it'll perform tons better.
    /\A (?> (?> [^(]* \( ) {$lvl} ) ( .+ ) \) {$lvl} \z/x;

    Makeshifts last the longest.

      Strangley enough, the benchmarks indictate this is no better. comparison w/ minor rewrites: Results:
      Benchmark: timing 7000 iterations of OP best: get_proparg_new, my best +: getpropstr, orig-arg2, yours... OP best: get_proparg_new: 3 wallclock secs ( 2.39 usr + 0.01 sys = +2.40 CPU) @ 2916.67/s (n=7000) my best: getpropstr: 4 wallclock secs ( 1.72 usr + 0.01 sys = 1.73 +CPU) @ 4046.24/s (n=7000) orig-arg2: 12 wallclock secs ( 9.71 usr + 0.04 sys = 9.75 CPU) @ 71 +7.95/s (n=7000) yours: 12 wallclock secs ( 9.69 usr + 0.03 sys = 9.72 CPU) @ 72 +0.16/s (n=7000)


      --Bob Niederman, http://bob-n.com
Re: Re: Re: Peeling the Peelings
by tilly (Archbishop) on Jul 02, 2003 at 08:02 UTC
    You know, someone ought to write a book about that (time increasing with regex complexity). :-)
Re: Re: Re: Peeling the Peelings
by traveler (Parson) on Jul 02, 2003 at 15:56 UTC
    Phooey. The lessons here are: if it looks good and seems to test well, it still might not be good -- test more; somewhat counterintuitively, {} match counts are slower than loops; some optimizations (e.g. those from Aristotle's post) are not as much more effecient as they seem; and as tilly and bobn point out a correlary of the second lesson is that processing a string through multiple single REs is often better than one complex one.

    --traveler