comment on

Ah well, in a fit of boredom I decided to benchmark the submissions myself. The results are unsurprising (at least to me).

The first thing I did was put everyone's code in subroutines that all conformed to the same interface. I quickly discovered that no one tested their code :-). salva has a precedence problem, mickeyn has an off-by-one problem, and Roy Johnson has an off-by-N problem where N happens to be 1 in the case that you're looking for a single character.

Granted the obiwan errors could just stem from counting the positions starting from 1 rather than 0. But when I'm asked to find the position of some character in a string, I try to make the value something that I could give to substr() and get the right answer. substr() uses 0-based positions and so do I.

Anyway, once I got them all conforming to the same interface, I wrote a little routine that would output a "ruler" for a given string and then tested all of the routines against visual inspection of the ruler and each other.

Once I was happy with that, I then wrote the benchmarks to compare looking for several occurences of a single character in a single string and then another set of benchmarks looking for a specific occurence of a single character in multiple strings.

Here's the program:

#!/usr/bin/perl

use warnings; use strict;
use Benchmark qw/:all/;

sub salva {
   my($string,$char,$n) = @_;
   my $pat = "[^$char]*$char" x $n;
   my $where = -1;
   $string =~ /^($pat)/ and $where = length($1) - 1;
   return $where;
}

{
   my %hash;

sub mickeyn {
   my ($string,$c,$n) = @_;
   return -1 if $n < 1;
   return $hash{$string}{$c}[$n-1] || -1 if exists $hash{$string};
   my @split = split //, $string;
   my @positions = map { [$split[$_], $_] } 0 .. $#split;
   undef @split;

   foreach (@positions){
     push @{$hash{$string}{$_->[0]}}, $_->[1];
   }
   undef @positions ;
   return $hash{$string}{$c}[$n-1] || -1;
}

}

sub RoyJohnson {
   my ($str,$char,$n) = @_;
   $str =~ /(?:.*?$char){$n}/g;
   return pos($str)-1 if defined pos($str);
   return -1;
}

sub find_nth {
   my ($s,$c,$n) = @_;
   my $pos = -1;
   while ($n--) {
      $pos = index($s,$c,$pos+1);
      return -1 if $pos == -1;
   }
   return $pos;
}

sub rule {
   my $s = shift;
   my $l = int(length($s)/10)+1;
   my $r1 = '0123456789' x $l;
   my $r2 = '          ' x $l;
   $r1 = substr($r1,0,length($s));
   substr($r2,$_*10,1) = $_ for 1..length($s)/10;
   return "$r2\n$r1\n";
}

sub test {
   my ($str,$c) = @_;
   print rule($str), "$str\n";
   print "($c,$_): ", join(" ",
      find_nth($str,$c,$_),
      salva($str,$c,$_),
      mickeyn($str,$c,$_),
      RoyJohnson($str,$c,$_)
   ),"\n" for -1..10;
}

my $str="Now is the time for all good men to come to the aid of their 
+country";
my $n = 5;
my $c = "o";

# test($str,$c); exit;

my $time = shift || -5;
cmpthese($time, {
   duff => sub { find_nth($str,$c,$_) for -1..10; },
   salva => sub { salva($str,$c,$_) for -1..10; },
   mickeyn => sub { mickeyn($str,$c,$_) for -1..10; },
   RoyJohnson => sub { RoyJohnson($str,$c,$_) for -1..10; },
});

my @strings = (
 "Now is the time for all good men to come to the aid of their country
+",
 "Four score and seven years ago our forefathers brought forth unto th
+is country",
 "How to write an efficient Perl function to return the position of th
+e nth occurance of a specified character within a string?",
 "The rain in spain lies mainly upon the plain",
 "a" x 200,
 "a" x 200 . "ooo",
 "o" x 200,
);

cmpthese($time, {
   duff => sub { find_nth($_,$c,3) for @strings; },
   salva => sub { salva($_,$c,3) for @strings; },
   mickeyn => sub { mickeyn($_,$c,3) for @strings; },
   RoyJohnson => sub { RoyJohnson($_,$c,3) for @strings; },
});
[download]

notes: mickeyn's needed to be modified the most and I took a few liberties there so it's not exactly faithful to his original implementation in some respects.

Here are some results on my system; the first set of benchmarks are for multple occurences in one string, the second set is for a single occurence in multiple strings:

              Rate RoyJohnson      salva       duff    mickeyn
RoyJohnson   373/s         --       -86%       -97%       -98%
salva       2604/s       598%         --       -78%       -88%
duff       11887/s      3089%       357%         --       -44%
mickeyn    21159/s      5576%       713%        78%         --
              Rate RoyJohnson      salva    mickeyn       duff
RoyJohnson 15272/s         --        -9%       -42%       -48%
salva      16847/s        10%         --       -36%       -42%
mickeyn    26298/s        72%        56%         --       -10%
duff       29202/s        91%        73%        11%         --

duff

In reply to Re^2: The nth occurrence of a character by duff
in thread The nth occurrence of a character by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.