RE: RE: From one beginner to others . . .

Looks as if those forced line-breaks at 80 chars in the original post weren't such a good idea after all. Sorry.:(

The question about caching is a good one. I can't say for certain that caching didn't play a part in the performance boost (or in that case I guess I should say a perceived performance boost).

I later re-created the business end of the routine on another computer and timed both approaches. The results were similar. Haven't yet benchmarked it. I'm still, argh, a bit hazy on exactly how to use Benchmark. But never mind "hazy"--I will get make my way through the haze and try it. (To date the two approaches have been timed using only the 4nt command processor's own timer function--far from exact, to be sure.)

One noticeable difference between the regular expression I used and the one you used in your example here: I had only one set of parens in it. I don't know if this is likely to make a big difference in performance.

Thanks for the feedback, folks.

Comment on RE: RE: From one beginner to others . . .

Replies are listed 'Best First'.
(Ovid) RE: RE: RE: From one beginner to others . . . by Ovid (Cardinal) on Jul 16, 2000 at 00:10 UTC
Only using one capturing paren will improve the performance of your regex as it will not be forced to do as much backreferencing. In playing around with this, I managed to optimize the `split` by breaking it into a minimal number of segments. In all cases, with my example, `split` significantly outperformed the regex. #!/usr/bin/perl -w use strict; use Benchmark; use vars qw($myvar $result $a $b $c $d); $myvar = "one,two,three,four"; timethese(1000000, { Regex => '$a=$1, $b=$2, $c=$3, $d=$4 if $myvar =~ /^[^,]+,([^,] ++),[^,]+,[^,]+$/', Split1 => '$result = (split /,/, $myvar)[1]', Split2 => '$result = (split /,/, $myvar, 4)[1]', Split3 => '$result = (split /,/, $myvar, 3)[1]' }); Benchmark: timing 1000000 iterations of Regex, Split1, Split2, Split3. +.. Regex: 26 wallclock secs (25.75 usr + 0.00 sys = 25.75 CPU) Split1: 16 wallclock secs (16.31 usr + 0.00 sys = 16.31 CPU) Split2: 16 wallclock secs (16.15 usr + 0.00 sys = 16.15 CPU) Split3: 13 wallclock secs (12.74 usr + 0.00 sys = 12.74 CPU) [download] Note the whopping improvement in performance of Split3. In my benchmark, it's approximately twice as fast as the regex. Cheers, Ovid	[reply] [d/l]
RE: RE: RE: RE: From one beginner to others . . . by Abigail (Deacon) on Jul 16, 2000 at 01:08 UTC
But your comparison isn't fair. You let the regex do way much work than needed. There's no need to parse the entire line, and only assign if there are exactly four fields - you aren't doing that for the `split` cases either. Also, you only have one set of parens, yet you do four assignments. Picking a simpler regex, and doing just one assignment improves the speed with 50%! `$a=$1 if $myvar =~ /^[^,]+,([^,]+)/` [download] Still not as fast as the `split`, but it shows that proper Benchmarking is an art. -- Abigail	[reply] [d/l]
(Ovid): The monk recants by Ovid (Cardinal) on Jul 16, 2000 at 04:08 UTC
D'oh! I optimized the split but not the regex :( That'll teach me to be careless. For honesty's sake: timethese(1000000, { Regex => '$a=$1 if $myvar =~ /^[^,]+,([^,]+)/', Split1 => '$result = (split /,/, $myvar)[1]', Split2 => '$result = (split /,/, $myvar, 4)[1]', Split3 => '$result = (split /,/, $myvar, 3)[1]' }); Benchmark: timing 1000000 iterations of Regex, Split1, Split2, Split3. +.. Regex: 14 wallclock secs (14.12 usr + 0.00 sys = 14.12 CPU) Split1: 17 wallclock secs (16.54 usr + 0.00 sys = 16.54 CPU) Split2: 16 wallclock secs (16.75 usr + 0.00 sys = 16.75 CPU) Split3: 14 wallclock secs (13.02 usr + 0.00 sys = 13.02 CPU) [download] I'm going to cry myself to sleep tonight. Curiously, though, it was the null assignments that appeared to be killing the efficiency (`$a=$1, $b=$2, $c=$3, $d=$4`) much more than the unoptimized regex. Hmmm.... Cheers, Ovid	[reply] [d/l]