Re: quickly counting substrings

Very nice comparison. You might as well have a swing at index, which is probably fast.

sub s_index{
 my $pos = 0;
 my $cnt = 0;
 while (index($str, $sep, $pos)){
  $pos++;  
  $cnt++;
 }
 $cnt{s_index}=$cnt;
}
[download]

Jeroen
"We are not alone"(FZ)

Update: Thanks to danger for pointing my typos out. I took his code, and benchmarked it:

my $_s_index = sub {
  my $pos = 0;
  my $cnt = 0;
  local $[ = 1;  # ook!
  my $pos = -1;
  ++$cnt while $pos = index($dta,"\n\n",$pos+2);
  $cnt{_s_index}=$cnt;
};
 
my $_while_re = sub{
  my $cnt = 0;
  my $recsep = "\n\n";
  $cnt++ while $dta =~ /$recsep/g;
  $cnt{_while_re} = $cnt;
};
=>resulted in:
Benchmark: timing 300 iterations of m_array, m_array_M, m_for, m_for_M
+, m_while, m_while_M, s_index, tr, while_re...
   m_array: 11 wallclock secs ( 5.13 usr +  0.04 sys =  5.17 CPU) @ 58
+.03/s (n=300)
 m_array_M:  4 wallclock secs ( 1.57 usr +  0.02 sys =  1.59 CPU) @ 18
+8.68/s (n=300)
     m_for:  9 wallclock secs ( 4.19 usr +  0.00 sys =  4.19 CPU) @ 71
+.60/s (n=300)
   m_for_M:  3 wallclock secs ( 1.41 usr +  0.01 sys =  1.42 CPU) @ 21
+1.27/s (n=300)
   m_while:  5 wallclock secs ( 2.46 usr +  0.00 sys =  2.46 CPU) @ 12
+1.95/s (n=300)
 m_while_M:  2 wallclock secs ( 0.94 usr +  0.00 sys =  0.94 CPU) @ 31
+9.15/s (n=300)
   s_index:  2 wallclock secs ( 0.88 usr +  0.00 sys =  0.88 CPU) @ 34
+0.91/s (n=300)
        tr:  1 wallclock secs ( 0.36 usr +  0.00 sys =  0.36 CPU) @ 83
+3.33/s (n=300)
            (warning: too few iterations for a reliable count)
  while_re:  2 wallclock secs ( 0.90 usr +  0.00 sys =  0.90 CPU) @ 33
+3.33/s (n=300)
 
=== check counts ===
         _while_re  2000
          _m_for_M  2000
        _m_array_M  2000
               _tr  6000
        _m_while_M  2000
          _m_while  6000
          _s_index  2000
            _m_for  6000
          _m_array  6000
[download]

With tr still being the winner, and closely tied 2nd and 3rd place for index and danger's while, respectively.

Than the light showed me that the separators should be all the same. That gave some shuffle:

Benchmark: timing 300 iterations of m_array, m_array_M, m_for, m_for_M
+, m_while, m_while_M, s_index, tr, while_re...
   m_array: 12 wallclock secs ( 5.04 usr +  0.04 sys =  5.08 CPU) @ 59
+.06/s (n=300)
 m_array_M: 11 wallclock secs ( 5.08 usr +  0.03 sys =  5.11 CPU) @ 58
+.71/s (n=300)
     m_for:  9 wallclock secs ( 4.19 usr +  0.00 sys =  4.19 CPU) @ 71
+.60/s (n=300)
   m_for_M:  8 wallclock secs ( 4.10 usr +  0.00 sys =  4.10 CPU) @ 73
+.17/s (n=300)
   m_while:  5 wallclock secs ( 2.26 usr +  0.02 sys =  2.28 CPU) @ 13
+1.58/s (n=300)
 m_while_M:  5 wallclock secs ( 2.28 usr +  0.02 sys =  2.30 CPU) @ 13
+0.43/s (n=300)
   s_index:  5 wallclock secs ( 2.13 usr +  0.00 sys =  2.13 CPU) @ 14
+0.85/s (n=300)
        tr:  0 wallclock secs ( 0.27 usr +  0.00 sys =  0.27 CPU) @ 11
+11.11/s (n=300)
            (warning: too few iterations for a reliable count)
  while_re:  5 wallclock secs ( 2.11 usr +  0.00 sys =  2.11 CPU) @ 14
+2.18/s (n=300)
 
=== check counts ===
         _while_re  6000
          _m_for_M  6000
        _m_array_M  6000
               _tr  6000
        _m_while_M  6000
          _m_while  6000
          _s_index  6000
            _m_for  6000
          _m_array  6000
[download]

Sadly enough the C-code doesn't compile on my machine. I don't have the time to get it working right now.

Comment on Re: quickly counting substrings Select or Download Code

Replies are listed 'Best First'.
Re: Re: quickly counting substrings by danger (Priest) on Mar 02, 2001 at 15:44 UTC
That routine is a little broken -- the return value of index() can be 0 on success, and you'll want to set $pos to the current position plus whatever the length of the separator is at each iteration. One version of using index() that fares somewhat comparably to the m_while_M routine is: `my $_index = sub { my $cnt = 0; local $[ = 1; # ook! my $pos = -1; ++$cnt while $pos = index($dta,"\n\n",$pos+2); $cnt{_index} = $cnt; };` [download] But an alternate version of the m_while_M routine (using while as a statement modifier rather than the block form) seems to be the best I can come up with at this hour: `my $_while_re = sub{ my $cnt = 0; my $recsep = "\n\n"; $cnt++ while $dta =~ /$recsep/g; $cnt{_while_re} = $cnt; };` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: quickly counting substrings
by danger (Priest) on Mar 02, 2001 at 15:44 UTC

That routine is a little broken -- the return value of index() can be 0 on success, and you'll want to set $pos to the current position plus whatever the length of the separator is at each iteration.

One version of using index() that fares somewhat comparably to the m_while_M routine is:

my $_index = sub {
    my $cnt = 0;
    local $[ = 1;  # ook!
    my $pos = -1;
    ++$cnt while $pos = index($dta,"\n\n",$pos+2);
    $cnt{_index} = $cnt;
};
[download]

But an alternate version of the m_while_M routine (using while as a statement modifier rather than the block form) seems to be the best I can come up with at this hour:

my $_while_re = sub{
    my $cnt = 0;
    my $recsep = "\n\n";
    $cnt++ while $dta =~ /$recsep/g;
    $cnt{_while_re} = $cnt;
};
[download]

[reply]
[d/l]
[select]