Just for fun... I created an nth_index in C++ that sorta mimics the behavior of index. ..."sorta", in that it will return -1 if the nth string isn't found, and will assume an offset greater than haystack length is equal to the haystack size-1, or less than zero is zero. But where it will most obviously fall short is that it's only capable of dealing in standard "byte" sized chars; no Unicode.

use Benchmark qw( cmpthese ); use Inline CPP => 'DATA'; $main::string = '1234!' x 55e3; sub cpp { my $copy = $main::string; my $ix = nth_index( $copy, '!', 50000 ); substr( $copy, 0, $ix ) =~ tr/!/\n/; return \$copy; } sub regex { my $copy = $main::string; $myregexp::count = 0; $copy =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)' })/\n/ +g; return \$copy; } sub buk { my( $p, $c ) = ( 0, 50e3 ); my $copy = $main::string; 1 while --$c and $p = index $copy, '!', $p; substr( $copy, 0, $p ) =~ tr[!][\n]; return \$copy; } cmpthese ( -5, { cpp => \&cpp, regex => \&regex, buk => \&buk, } ); __DATA__ __CPP__ #include <string> long nth_index ( const char* c_str_haystack, const char* c_str_needle, long n, long offset = 0 ) { std::string haystack( c_str_haystack ); std::string needle( c_str_needle ); if( ! needle.size() || ! haystack.size() ) return -1; if( offset < 0 ) offset = 0; if( offset >= haystack.size() ) offset = haystack.size() - 1; size_t pos = haystack.find( needle, offset ); while( --n ) { pos = haystack.find( needle, pos + 1 ); if( pos == std::string::npos ) return -1; } return pos; }

The results:

$ ./mytest10.pl Rate regex buk cpp regex 10.0/s -- -97% -99% buk 291/s 2803% -- -63% cpp 777/s 7657% 167% --

This goes back to the original benchmark where I just made copies. I understood the xor-flag method, but I already had this written and didn't bother to rewrite.

There's a gross inefficiency in the C++ version, that if fixed, would greatly improve the speed of that solution. The biggest inefficiency is that I'm copying the c-strings that Perl passes into the subroutine into C++ "string" objects. This is a linear operation, and really totally unnecessary if I were motivated enough to forgo the "string::find" method and roll my own instead. If I were to do that I could just deal directly with C strings and avoid making another copy. But I wanted to play with C++ strings just to see how it would look.

Nevertheless, the results are predictably good. Don't try this on Unicode strings though. And if the needle string has a length greater than one, we would have to be aware that my method will detect needles that overlap each other. If we wanted to avoid that the C++ function should add the length of the needle to "pos".

The exercise was beneficial, as I discovered a bug in Inline::CPP's documentation on how to declare user typemaps, that I'll be able to fix in an upcoming release.


Dave


In reply to Re^8: Fast Replacement (0.01 seconds) by davido
in thread Fast Replacement by sathishselvam

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.