Re: Re: Peeling the Peelings

But for the original sample of data, it is hideous.

#!/usr/bin/perl -w


$|++;
use strict;
use Benchmark;


sub arg2 {
   my($str,$lvl)= @_;
   my $str1 = $str;
   return $str if !$lvl;
   # skip up to and including nth ( paren, then strip n ) parens from 
+end
   $str =~ /([^(]*\(){$lvl}(.+)(\)){$lvl}/;

   return $2 unless $lvl == -1;
   $str =~ /\(([^\(\)]+)\)+/;
   return $1;

}


sub getpropstr
{
    local $_ = shift;
    my $level = shift || 0;
    if ( $level == -1 ) { /([^()]+)\)+$/; return $1 }
    while ( $level-- > 0)
    {
        chop;
        s/^[^(]+\(//;
    }
    return $_;
}

my @data = <DATA>;
chomp @data;

if(0)
{
for (@data)
{
    print "$_\n";
    print getpropstr( split ), "\n" ;
    print get_proparg( split ), "\n" ;
    print prop3( split ), "\n" ;
#    print peeln( split ), "\n" ;
    print arg2( split ), "\n" ;
}
}

my $count = 15000;

timethese
(
    $count, 
    {
        'mine-elegant' => sub { for (@data) { prop3( split ); } },
        'mine-less-elegant' => sub { for (@data) { getpropstr( split  
+); } },
        'original' =>  sub { for (@data) { get_proparg( split ); } },
        # 'other' =>  sub { for (@data) { peeln( split ); } },
        'yours-arg2' =>  sub { for (@data) { peeln( split ); } },
    }
);

sub peeln {
   local $_ = shift;
   my $n = shift || 0;
   my($c) = (tr[(][]);
   # warn 'Unbalanced parens' unless $o == $c;
   $n = $c if $n == -1;
   m[^ (?: .*? \( .*?){$n} (.+) (?: .*? \) .*?){$n} $]x;
   $1;
}




sub get_proparg {
 my $propstr = shift;         # get the property string
 my $level   = shift || 0;    # get the level we want to extract
 my $back    = $propstr;      # Initialize Return value (for case leve
+l=0)
 my $cnt;                     # initialize counter

 if($level == -1) {           # special case, get the innermost argume
+nt
   $propstr =~ /\(([^\(\)]+)\)+/;
   $propstr = $1;
 } else {                     # get whatever argument $level indicates
   for($cnt = 0;$cnt<$level; $cnt++) {
     $propstr =~ /\((.+)\)/;
     $propstr = $1;
   }
 }

 return $propstr;
}

sub prop3
{
    my $str = shift;
    my $level = shift || 0;
    return $str unless $level;
    my @str = split(/[()]/, $str);
    splice @str, 0, $level;
    my $out = join('(', @str);
    $out .= ')' x ( @str - 1 );
    return $out;
}


__END__
hello(what(is(this(all(about))))) -1
hello(what(is(this(all(about))))) 0
hello(what(is(this(all(about))))) 1
hello(what(is(this(all(about))))) 2
hello(what(is(this(all(about))))) 3
hello(what(is(this(all(about))))) 4
hello(what(is(this(all(about))))) 5
[download]

Results:

Benchmark: timing 15000 iterations of mine-elegant, mine-less-elegant,
+ original, yours-arg2...
mine-elegant:  7 wallclock secs ( 5.19 usr +  0.00 sys =  5.19 CPU) @ 
+2890.17/s (n=15000)
mine-less-elegant:  4 wallclock secs ( 3.19 usr +  0.01 sys =  3.20 CP
+U) @ 4687.50/s (n=15000)
  original:  6 wallclock secs ( 5.06 usr +  0.02 sys =  5.08 CPU) @ 29
+52.76/s (n=15000)
yours-arg2: 32 wallclock secs (25.06 usr +  0.08 sys = 25.14 CPU) @ 59
+6.66/s (n=15000)
[download]

The lesson I'm taking away from this: simple regexes can be very fast, but time increasses rapidly with regex complexity.

--Bob Niederman, http://bob-n.com

Comment on Re: Re: Peeling the Peelings Select or Download Code

Replies are listed 'Best First'.
Re^3: Peeling the Peelings by Aristotle (Chancellor) on Jul 02, 2003 at 08:05 UTC
They don't need to. You just have to tell the regex engine exactly what you want. The regex he used has no anchors - why? There's also a bunch of useless capturing parens, and in fact one paren pair that's completely superfluous. All of that is not what we wanted. I've got no Perl here, so I'll have to test this later, but I'm pretty positive that the following works as specified, and very certain that it'll perform tons better. `/\A (?> (?> [^(]* $ ) {$lvl} ) ( .+ ) $ {$lvl} \z/x;` [download] Makeshifts last the longest.	[reply] [d/l]
Re: Re^3: Peeling the Peelings by bobn (Chaplain) on Jul 02, 2003 at 13:30 UTC
Strangley enough, the benchmarks indictate this is no better. comparison w/ minor rewrites: Read more... (2 kB) Results: `Benchmark: timing 7000 iterations of OP best: get_proparg_new, my best +: getpropstr, orig-arg2, yours... OP best: get_proparg_new: 3 wallclock secs ( 2.39 usr + 0.01 sys = +2.40 CPU) @ 2916.67/s (n=7000) my best: getpropstr: 4 wallclock secs ( 1.72 usr + 0.01 sys = 1.73 +CPU) @ 4046.24/s (n=7000) orig-arg2: 12 wallclock secs ( 9.71 usr + 0.04 sys = 9.75 CPU) @ 71 +7.95/s (n=7000) yours: 12 wallclock secs ( 9.69 usr + 0.03 sys = 9.72 CPU) @ 72 +0.16/s (n=7000)` [download] --Bob Niederman, http://bob-n.com	[reply] [d/l] [select]
Re: Re: Re: Peeling the Peelings by tilly (Archbishop) on Jul 02, 2003 at 08:02 UTC
You know, someone ought to write a book about that (time increasing with regex complexity). :-)	[reply]
Re: Re: Re: Peeling the Peelings by traveler (Parson) on Jul 02, 2003 at 15:56 UTC
Phooey. The lessons here are: if it looks good and seems to test well, it still might not be good -- test more; somewhat counterintuitively, {} match counts are slower than loops; some optimizations (e.g. those from Aristotle's post) are not as much more effecient as they seem; and as tilly and bobn point out a correlary of the second lesson is that processing a string through multiple single REs is often better than one complex one. --traveler	[reply]