in reply to Re: Producing a list of offsets efficiently
in thread Producing a list of offsets efficiently

not sure how efficient this is...

Not very much at all :) . It corresponds to wmap in the table below:

Rate wmap count wgrep wregex windex wmap 5.42/s -- -12% -60% -97% -98% count 6.13/s 13% -- -54% -97% -98% wgrep 13.5/s 148% 120% -- -93% -96% wregex 198/s 3558% 3132% 1372% -- -37% windex 317/s 5741% 5062% 2251% 60% --


use strict; use warnings; use Benchmark 'cmpthese'; srand( 0 ); my $s = join '', map chr(97+int(rand(26))), 1..100000; cmpthese( -2, { windex => \&windex, wregex => \&wregex, wgrep => \&wgrep, count => \&count, wmap => \&wmap, } ); sub windex { my @o; my $o = -1; while ( ( $o = index( $s, 'a', $o+1 )) > -1 ) { push @o, $o } return; } sub wregex { my @o; $s =~ m/a(?{ push @o, pos() - 1 })(?!)/; return; } sub wgrep { my @o = grep substr( $s, $_, 1 ) eq 'a', 0..length($s)-1; return; } sub count { my @o; my $count = 0; for ( split //, $s ) { push @o, $count if $_ eq 'a'; ++$count; } return; } sub wmap { my $count = 0; my @o = map { $count++; /a/ ? $count - 1 : () } split //, $s; return; }

the lowliest monk

Replies are listed 'Best First'.
Re^3: Producing a list of offsets efficiently
by japhy (Canon) on May 29, 2005 at 17:11 UTC
    Of course wmap() and count() are going to be the slowest, because they go through the string twice: once to split it into individual elements, and again to compare each element to the desired one. The index() approach is very certainly the fastest. I wrote a function called aindex() a long time ago that did just that -- used index to step through a string and return all the indices of a substring in that string.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

      Yes, no surprises there on the ordering, though the hard numbers are eloquent; I would not have been able to predict a factor of ∼25 difference between wgrep and windex_1, say.

      the lowliest monk