ikegami,

Very nice!

When I saw your answer, I wondered whether a bunch of index calls would be faster. I didn't think so, but it was worth a shot. (But see at the end -- the index approach has BETTER worst case behaviour...)

The approach I was proposing was:

sub findInterleavedString { my $string=$_[0]; my $word=$_[1]; my @letters=split //,$word; my $index=0; for(my $i=0;$i<=$#letters && $index!=-1;++$i) { $index=index $string, $letters[$i], $index+1; } return $index!=-1; }

I tested 2 versions of each approach: ikegami and rmgir "precompute" the regex (or the array) for sigma, while the _noprep variants don't.

#!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); { my $word="sigma"; my $re = join '.*?', map quotemeta, split (//,$word); sub ikegami { my $result=1; foreach my $text ( 'sigma', 'stigma', 'magma', ) { $result &&= ($text=~$re); } } } sub ikegami_noprep { my $word="sigma"; my $re = join '.*?', map quotemeta, split (//,$word); my $result=1; foreach my $text ( 'sigma', 'stigma', 'magma', ) { $result &&= ($text=~$re); } } { my $word='sigma'; my @letters=split //,$word; sub findSigma { my $string=$_[0]; my $index=0; for(my $i=0;$i<=$#letters && $index!=-1;++$i) { $index=index $string, $letters[$i], $index+1; } return $index!=-1; } sub rmgir { my $result=1; foreach my $text ( 'sigma', 'stigma', 'magma', ) { $result &&= findSigma($text); } } } sub findInterleavedString { my $string=$_[0]; my $word=$_[1]; my @letters=split //,$word; my $index=0; for(my $i=0;$i<=$#letters && $index!=-1;++$i) { $index=index $string, $letters[$i], $index+1; } return $index!=-1; } sub rmgir_noprep { my $result=1; foreach my $text ( 'sigma', 'stigma', 'magma', ) { $result &&= findInterleavedString($text, 'sigma'); } } cmpthese -3, { ikegami => \&ikegami, ikegami_noprep => \&ikegami_noprep, rmgir => \&rmgir, rmgir_noprep => \&rmgir_noprep, };
The results confirm that your approach kicks index all around the room :)

                   Rate  rmgir_noprep ikegami_noprep         rmgir       ikegami
rmgir_noprep    24065/s            --           -65%          -71%          -92%
ikegami_noprep  69743/s          190%             --          -16%          -76%
rmgir           82694/s          244%            19%            --          -72%
ikegami        295753/s         1129%           324%          258%            --
This makes sense, of course, since the iteration from one "index" call to the next happens in perl, and the regex is all one opcode.

But it was a fun Saturday afternoon experiment...

Edit: Note that for REALLY pathological strings, the index approach still performs reasonably well, while the regex can become very slow or hang... For example, if matching against

	('a' x 110) .
	'stigm' .
	('stigm' x 100);
(which doesn't match), the regex approaches take about 2 seconds to fail 1 call, while the index approach can still make over 10000 calls per second.

So if you have to worry about a hostile user feeding your program bad strings, for example, and your maximum performance requirement isn't too high, then the index approach might be better...

Added missing +1 to index last argument, and note about worst case behaviour...


Mike

In reply to Re^2: matching with inclusion by RMGir
in thread matching with inclusion by vitaly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.