in reply to quickly counting substrings

Very nice comparison. You might as well have a swing at index, which is probably fast.
sub s_index{ my $pos = 0; my $cnt = 0; while (index($str, $sep, $pos)){ $pos++; $cnt++; } $cnt{s_index}=$cnt; }

Jeroen
"We are not alone"(FZ)

Update: Thanks to danger for pointing my typos out. I took his code, and benchmarked it:

my $_s_index = sub { my $pos = 0; my $cnt = 0; local $[ = 1; # ook! my $pos = -1; ++$cnt while $pos = index($dta,"\n\n",$pos+2); $cnt{_s_index}=$cnt; }; my $_while_re = sub{ my $cnt = 0; my $recsep = "\n\n"; $cnt++ while $dta =~ /$recsep/g; $cnt{_while_re} = $cnt; }; =>resulted in: Benchmark: timing 300 iterations of m_array, m_array_M, m_for, m_for_M +, m_while, m_while_M, s_index, tr, while_re... m_array: 11 wallclock secs ( 5.13 usr + 0.04 sys = 5.17 CPU) @ 58 +.03/s (n=300) m_array_M: 4 wallclock secs ( 1.57 usr + 0.02 sys = 1.59 CPU) @ 18 +8.68/s (n=300) m_for: 9 wallclock secs ( 4.19 usr + 0.00 sys = 4.19 CPU) @ 71 +.60/s (n=300) m_for_M: 3 wallclock secs ( 1.41 usr + 0.01 sys = 1.42 CPU) @ 21 +1.27/s (n=300) m_while: 5 wallclock secs ( 2.46 usr + 0.00 sys = 2.46 CPU) @ 12 +1.95/s (n=300) m_while_M: 2 wallclock secs ( 0.94 usr + 0.00 sys = 0.94 CPU) @ 31 +9.15/s (n=300) s_index: 2 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 34 +0.91/s (n=300) tr: 1 wallclock secs ( 0.36 usr + 0.00 sys = 0.36 CPU) @ 83 +3.33/s (n=300) (warning: too few iterations for a reliable count) while_re: 2 wallclock secs ( 0.90 usr + 0.00 sys = 0.90 CPU) @ 33 +3.33/s (n=300) === check counts === _while_re 2000 _m_for_M 2000 _m_array_M 2000 _tr 6000 _m_while_M 2000 _m_while 6000 _s_index 2000 _m_for 6000 _m_array 6000
With tr still being the winner, and closely tied 2nd and 3rd place for index and danger's while, respectively.

Than the light showed me that the separators should be all the same. That gave some shuffle:

Benchmark: timing 300 iterations of m_array, m_array_M, m_for, m_for_M +, m_while, m_while_M, s_index, tr, while_re... m_array: 12 wallclock secs ( 5.04 usr + 0.04 sys = 5.08 CPU) @ 59 +.06/s (n=300) m_array_M: 11 wallclock secs ( 5.08 usr + 0.03 sys = 5.11 CPU) @ 58 +.71/s (n=300) m_for: 9 wallclock secs ( 4.19 usr + 0.00 sys = 4.19 CPU) @ 71 +.60/s (n=300) m_for_M: 8 wallclock secs ( 4.10 usr + 0.00 sys = 4.10 CPU) @ 73 +.17/s (n=300) m_while: 5 wallclock secs ( 2.26 usr + 0.02 sys = 2.28 CPU) @ 13 +1.58/s (n=300) m_while_M: 5 wallclock secs ( 2.28 usr + 0.02 sys = 2.30 CPU) @ 13 +0.43/s (n=300) s_index: 5 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 14 +0.85/s (n=300) tr: 0 wallclock secs ( 0.27 usr + 0.00 sys = 0.27 CPU) @ 11 +11.11/s (n=300) (warning: too few iterations for a reliable count) while_re: 5 wallclock secs ( 2.11 usr + 0.00 sys = 2.11 CPU) @ 14 +2.18/s (n=300) === check counts === _while_re 6000 _m_for_M 6000 _m_array_M 6000 _tr 6000 _m_while_M 6000 _m_while 6000 _s_index 6000 _m_for 6000 _m_array 6000
Sadly enough the C-code doesn't compile on my machine. I don't have the time to get it working right now.

Replies are listed 'Best First'.
Re: Re: quickly counting substrings
by danger (Priest) on Mar 02, 2001 at 15:44 UTC

    That routine is a little broken -- the return value of index() can be 0 on success, and you'll want to set $pos to the current position plus whatever the length of the separator is at each iteration.

    One version of using index() that fares somewhat comparably to the m_while_M routine is:

    my $_index = sub { my $cnt = 0; local $[ = 1; # ook! my $pos = -1; ++$cnt while $pos = index($dta,"\n\n",$pos+2); $cnt{_index} = $cnt; };

    But an alternate version of the m_while_M routine (using while as a statement modifier rather than the block form) seems to be the best I can come up with at this hour:

    my $_while_re = sub{ my $cnt = 0; my $recsep = "\n\n"; $cnt++ while $dta =~ /$recsep/g; $cnt{_while_re} = $cnt; };