Re^7: Fast Replacement (0.01 seconds)

Using the eval subroutines was a step too far. Whilst much better than eval for every line, the additional subroutine call still has a substantial impact.

Going back to hardcoded trs, and you get the picture we were both expecting:

C:\test\ACA>..\junk71 -N=4
            (warning: too few iterations for a reliable count)
    Rate     b     a
b 1.45/s    --  -98%
a 85.1/s 5751%    --

C:\test\ACA>..\junk71 -N=10
            (warning: too few iterations for a reliable count)
    Rate     b     a
b 2.94/s    --  -97%
a 91.7/s 3025%    --

C:\test\ACA>..\junk71 -N=20
            (warning: too few iterations for a reliable count)
    Rate     b     a
b 4.55/s    --  -96%
a  116/s 2453%    --

C:\test\ACA>..\junk71 -N=50
            (warning: too few iterations for a reliable count)
    Rate     b     a
b 6.67/s    --  -95%
a  133/s 1900%    --

C:\test\ACA>..\junk71 -N=100
    Rate     b     a
b 7.93/s    --  -95%
a  149/s 1776%    --
[download]

Updated benchmark code:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

sub makeTR{
    eval "sub{ \$_[ 0 ] =~ tr[$_[0]][$_[1]] }";
}

our $N //= 10; die "$N must be even and positive" if $N &1 or $N < 2;
our $tr1 = makeTR( '!', "\n" );
our $tr2 = makeTR( "\n", '!' );
our $flag = 0;
our $s = '1234!' x 55e3;

cmpthese $N, {
    a => q[
        if( $flag ) {
            my( $p, $c ) = ( 0, 50e3 );
            1 while --$c and $p = index $s, "\n", $p;
            $s =~ tr[\n][!]; #$tr2->( substr $s, 0, $p );
            $flag ^= 1;
        }
        else {
            my( $p, $c ) = ( 0, 50e3 );
            1 while --$c and $p = index $s, "!", $p;
            $s =~ tr[!][\n]; #$tr1->( substr $s, 0, $p );
            $flag ^= 1;
        }
    ],
    b => q[
        if( $flag ) {
            $s =~ s/\n(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)
+' })/!/g;
            $flag ^= 1;
        }
        else {
            $s =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)'
+ })/\n/g;
            $flag ^= 1;
        }
    ],
};
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^7: Fast Replacement (0.01 seconds) Select or Download Code

Replies are listed 'Best First'.
Re^8: Fast Replacement (0.01 seconds) by davido (Cardinal) on Jun 14, 2013 at 22:21 UTC
When I have a moment I want to check to see if this is a faster way of finding the 50k-th "!": `if( $string =~ m/(?:[^!]!){49999}[^!]!/; ) { print $+[0] + 1, "\n" # Updated. }` [download] It could be, because it keeps all the work inside of Perl's internals. If Perl had a native function that said, "`find_nth($string, '!')`", it would beat any regexp solution, but it doesn't (at least not as a built-in). Update: "`Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/(?:[^!]!){ <-- HERE 49999}[^!]!/ at mytest10.pl line 27.`" (I forgot about that.) Update 2: Assuming we're only dealing with ASCII, this is trivial using Inline::C. Walking the string and making the change up to 50k times will be trivial, and fast. If I get around to it I'll post an example. Dave	[reply] [d/l] [select]
Re^8: Fast Replacement (0.01 seconds) by davido (Cardinal) on Jun 16, 2013 at 16:52 UTC
Just for fun... I created an `nth_index` in C++ that sorta mimics the behavior of index. ..."sorta", in that it will return -1 if the nth string isn't found, and will assume an offset greater than haystack length is equal to the haystack size-1, or less than zero is zero. But where it will most obviously fall short is that it's only capable of dealing in standard "byte" sized chars; no Unicode. use Benchmark qw( cmpthese ); use Inline CPP => 'DATA'; $main::string = '1234!' x 55e3; sub cpp { my $copy = $main::string; my $ix = nth_index( $copy, '!', 50000 ); substr( $copy, 0, $ix ) =~ tr/!/\n/; return \$copy; } sub regex { my $copy = $main::string; $myregexp::count = 0; $copy =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)' })/\n/ +g; return \$copy; } sub buk { my( $p, $c ) = ( 0, 50e3 ); my $copy = $main::string; 1 while --$c and $p = index $copy, '!', $p; substr( $copy, 0, $p ) =~ tr[!][\n]; return \$copy; } cmpthese ( -5, { cpp => \&cpp, regex => \&regex, buk => \&buk, } ); __DATA__ __CPP__ #include <string> long nth_index ( const char* c_str_haystack, const char* c_str_needle, long n, long offset = 0 ) { std::string haystack( c_str_haystack ); std::string needle( c_str_needle ); if( ! needle.size() \|\| ! haystack.size() ) return -1; if( offset < 0 ) offset = 0; if( offset >= haystack.size() ) offset = haystack.size() - 1; size_t pos = haystack.find( needle, offset ); while( --n ) { pos = haystack.find( needle, pos + 1 ); if( pos == std::string::npos ) return -1; } return pos; } [download] The results: `$ ./mytest10.pl Rate regex buk cpp regex 10.0/s -- -97% -99% buk 291/s 2803% -- -63% cpp 777/s 7657% 167% --` [download] This goes back to the original benchmark where I just made copies. I understood the xor-flag method, but I already had this written and didn't bother to rewrite. There's a gross inefficiency in the C++ version, that if fixed, would greatly improve the speed of that solution. The biggest inefficiency is that I'm copying the c-strings that Perl passes into the subroutine into C++ "string" objects. This is a linear operation, and really totally unnecessary if I were motivated enough to forgo the "string::find" method and roll my own instead. If I were to do that I could just deal directly with C strings and avoid making another copy. But I wanted to play with C++ strings just to see how it would look. Nevertheless, the results are predictably good. Don't try this on Unicode strings though. And if the needle string has a length greater than one, we would have to be aware that my method will detect needles that overlap each other. If we wanted to avoid that the C++ function should add the length of the needle to "pos". The exercise was beneficial, as I discovered a bug in Inline::CPP's documentation on how to declare user typemaps, that I'll be able to fix in an upcoming release. Dave	[reply] [d/l] [select]
Re^9: Fast Replacement (0.01 seconds) by BrowserUk (Patriarch) on Jun 16, 2013 at 18:39 UTC
I came up with a similar I::C implementation. I'd run it against yours but yours doesn't compile on my machine at the moment: `int indexNth( SV haystack, SVneedle, int nth ) { STRLEN lh, ln, i, j; char h = SvPV( haystack, lh ); char n = SvPV( needle, ln ); for( i = 0; i < lh; ++i ) { for( j = 0; j < ln; ++j ) { if( h[ i + j ] != n[ j ] ) goto nomatch; } if( --nth == 0 ) return i; nomatch:; } return -1; }` [download] Got the C++ to compile -- missing newline at the end of the file, These are the results: `C:\test>1039228.pl Rate regex buk cpp c regex 4.06/s -- -97% -99% -100% buk 137/s 3277% -- -75% -85% cpp 546/s 13345% 298% -- -40% c 915/s 22446% 568% 68% --` [download] And the additional test: `sub c { my $copy = $main::string; my $ix = indexNth( $copy, '!', 50000 ); substr( $copy, 0, $ix ) =~ tr/!/\n/; return \$copy; }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]