in reply to Re: The nth occurrence of a character
in thread The nth occurrence of a character

Ah well, in a fit of boredom I decided to benchmark the submissions myself. The results are unsurprising (at least to me).

The first thing I did was put everyone's code in subroutines that all conformed to the same interface. I quickly discovered that no one tested their code :-). salva has a precedence problem, mickeyn has an off-by-one problem, and Roy Johnson has an off-by-N problem where N happens to be 1 in the case that you're looking for a single character.

Granted the obiwan errors could just stem from counting the positions starting from 1 rather than 0. But when I'm asked to find the position of some character in a string, I try to make the value something that I could give to substr() and get the right answer. substr() uses 0-based positions and so do I.

Anyway, once I got them all conforming to the same interface, I wrote a little routine that would output a "ruler" for a given string and then tested all of the routines against visual inspection of the ruler and each other.

Once I was happy with that, I then wrote the benchmarks to compare looking for several occurences of a single character in a single string and then another set of benchmarks looking for a specific occurence of a single character in multiple strings.

Here's the program:

#!/usr/bin/perl use warnings; use strict; use Benchmark qw/:all/; sub salva { my($string,$char,$n) = @_; my $pat = "[^$char]*$char" x $n; my $where = -1; $string =~ /^($pat)/ and $where = length($1) - 1; return $where; } { my %hash; sub mickeyn { my ($string,$c,$n) = @_; return -1 if $n < 1; return $hash{$string}{$c}[$n-1] || -1 if exists $hash{$string}; my @split = split //, $string; my @positions = map { [$split[$_], $_] } 0 .. $#split; undef @split; foreach (@positions){ push @{$hash{$string}{$_->[0]}}, $_->[1]; } undef @positions ; return $hash{$string}{$c}[$n-1] || -1; } } sub RoyJohnson { my ($str,$char,$n) = @_; $str =~ /(?:.*?$char){$n}/g; return pos($str)-1 if defined pos($str); return -1; } sub find_nth { my ($s,$c,$n) = @_; my $pos = -1; while ($n--) { $pos = index($s,$c,$pos+1); return -1 if $pos == -1; } return $pos; } sub rule { my $s = shift; my $l = int(length($s)/10)+1; my $r1 = '0123456789' x $l; my $r2 = ' ' x $l; $r1 = substr($r1,0,length($s)); substr($r2,$_*10,1) = $_ for 1..length($s)/10; return "$r2\n$r1\n"; } sub test { my ($str,$c) = @_; print rule($str), "$str\n"; print "($c,$_): ", join(" ", find_nth($str,$c,$_), salva($str,$c,$_), mickeyn($str,$c,$_), RoyJohnson($str,$c,$_) ),"\n" for -1..10; } my $str="Now is the time for all good men to come to the aid of their +country"; my $n = 5; my $c = "o"; # test($str,$c); exit; my $time = shift || -5; cmpthese($time, { duff => sub { find_nth($str,$c,$_) for -1..10; }, salva => sub { salva($str,$c,$_) for -1..10; }, mickeyn => sub { mickeyn($str,$c,$_) for -1..10; }, RoyJohnson => sub { RoyJohnson($str,$c,$_) for -1..10; }, }); my @strings = ( "Now is the time for all good men to come to the aid of their country +", "Four score and seven years ago our forefathers brought forth unto th +is country", "How to write an efficient Perl function to return the position of th +e nth occurance of a specified character within a string?", "The rain in spain lies mainly upon the plain", "a" x 200, "a" x 200 . "ooo", "o" x 200, ); cmpthese($time, { duff => sub { find_nth($_,$c,3) for @strings; }, salva => sub { salva($_,$c,3) for @strings; }, mickeyn => sub { mickeyn($_,$c,3) for @strings; }, RoyJohnson => sub { RoyJohnson($_,$c,3) for @strings; }, });

notes: mickeyn's needed to be modified the most and I took a few liberties there so it's not exactly faithful to his original implementation in some respects.

Here are some results on my system; the first set of benchmarks are for multple occurences in one string, the second set is for a single occurence in multiple strings:

              Rate RoyJohnson      salva       duff    mickeyn
RoyJohnson   373/s         --       -86%       -97%       -98%
salva       2604/s       598%         --       -78%       -88%
duff       11887/s      3089%       357%         --       -44%
mickeyn    21159/s      5576%       713%        78%         --
              Rate RoyJohnson      salva    mickeyn       duff
RoyJohnson 15272/s         --        -9%       -42%       -48%
salva      16847/s        10%         --       -36%       -42%
mickeyn    26298/s        72%        56%         --       -10%
duff       29202/s        91%        73%        11%         --