Re: opposite of index+rindex? How-to? Needed?

To answer the question as to why I was avoiding the regex engine, and to try some alternatives for space skipping (that would work in most perls, not just the latest, though that's great that 5.30 got that boost), I wrote a bench prog. The first part is the positive -- using index and substr to do parsing instead of regex. Note, for many things 'tr' seems to work alot like the regex engine so the more general case one uses with 'tr', the more it's like a regex instead of a map -- like mapping all characters except the ones you want -- unless you enumerate all the characters and perl builds a translation matrix, -- if you use a 'class' for example, it might use the regular regex engine and pull in its relative slowness. I say relative, as these examples can show:

#!/usr/bin/perl
use strict; use warnings;
# vim=:SetNumberAndWidth

######################################################################
+##########

use Benchmark qw(:all);

our @words;

my $count=@ARGV?$ARGV[0]:50000;

my $str=q(     <location href="debug/noarch/post-build-checks-debugsou
+rce-84.87+git20170929.5b244d1-1.1.noarch.rpm"/>);

sub getss1() {
  local $_ = $str;
  my $start = 1+index $_, q(");
  my $end   = index $_, q("), $start;
  $_ = substr $_, $start, $end; 
}

use String::Index qw(cindex ncindex);

sub getss2() {
  local $_ = $str;
  my $start = 1+cindex $_, q(");
  my $end   = cindex $_, q("), $start;
  $_ = substr $_, $start, $end; 
}

sub getss3() {
  @words = split q( ),$str;
  local $_ = $words[1];
  $_ = $words[1];
  my $start = 1+index $_, q(");
  my $end   = index $_, q("), $start;
  $_ = substr $_, $start, $end; 
}


sub getsub1() {
  local $_ = $str;
  s/^[^"]*"([^"]+)".*$/$1/;
  $_;
}

sub getsub2() {
  local $_ = $str;
  m{^[^"]*"([^"]+)".*$};
  $1;
}


cmpthese($count, {
  'ss1' => 'getss1',
  'ss2' => 'getss2',
  'ss3' => 'getss3',
  'sub1' => 'getsub1',
  'sub2' => 'getsub2',
  });




# vim: ts=2 sw=2 ai number
[download]

And a run:

> /tmp/bench
          Rate sub1 sub2  ss2  ss3  ss1
sub1  384615/s   -- -38% -38% -54% -69%
sub2  625000/s  62%   --  -0% -25% -50%
ss2   625000/s  62%   0%   -- -25% -50%
ss3   833333/s 117%  33%  33%   -- -33%
ss1  1250000/s 225% 100% 100%  50%   --
[download]

ss1 is with normal index+substr to isolate a string and is by far the fastest. substitution is the slowest and regex is about twice as fast as that (but still only 50% index+substr).

ss2 uses the cindex routine -- I'd suspect the nindex would be along the same speed lines -- a good choice for a general 'nindex'.

But for the cases I mentioned with space or whitespace, using 'split' with its single space literal arg, incurs the least overhead (apart from not using it as in ss1). For a general case of looking at different fields in my input that are separated by blanks, and for removing leading space, split seems to be the optimal choice for narrowing down the words (I get rid of the '<' after the spaces, then use a hash of the 1st 4 chars of the tag). If I needed faster (though this is not really worth the effort at this point), I can setup constants equivalent to the 1st 4 characters that equate to numbers then call tag-specific routines based on an array rather than a hash).

So in exploring some of the suggestions here and writing a reply, I think I stumbled onto what I'll use for now, which 'split'. No doubt its speed and possibly a related algorithm has likely been incorporated into the 5.30's regex for the leading whitespace case.

Thanks for the hints...

Comment on Re: opposite of index+rindex? How-to? Needed? Select or Download Code


Your skill will accomplish what the force of many cannot
	PerlMonks