comment on

Are you sure that chunking the strings is where your performance bottleneck is? Have you profiled? Could IO be a more significant constraint to execution time? I only ask because it doesn't seem like the performance of substr for 10,000,000 strings is all that bad for the problem domain.

If this section of code is really significant, here's a comparison of the valid solutions provided up to this point in the thread. Naturally unpack wins. It's only a few seconds slower in 10,000,000 iterations than the "control" case (which isn't a solution, but just a check to see what the framework for each solution costs).

use strict;
use warnings;
use Benchmark qw/timethese/;

# Test/Benchmark parameters.
$main::string = q/CTTCGAATT/;
our $time = 10000000;

my $subs_to_test = {
    substr  => \&by_substr,
    match   => \&by_match,
    unpack  => \&by_unpack,
    control => \&control,
};


# Benchmark.
timethese( $time, $subs_to_test );

# Subs being benchmarked.
sub control {
    my @substrings;
    @substrings = qw/CTT CGA ATT/;
    return \@substrings;
}

sub by_substr {
    my $position = 0;
    my @substrings;
    while( $position < length $main::string ) {
        push @substrings, substr( $main::string, $position, 3 );
        $position += 3;
    }
    return \@substrings;
}

sub by_match {
    my @substrings;
    while( $main::string =~ m/(...)/sg ) {
        push @substrings, $1;
    }
    return \@substrings;
}

sub by_unpack {
    my @substrings;
    @substrings = unpack( '(a3)*', $main::string );
    return \@substrings;
}
[download]

Here's the output.

Benchmark: timing 10000000 iterations of control, match, substr, unpac
+k...
   control:  5 wallclock secs ( 5.65 usr +  0.00 sys =  5.65 CPU) @ 17
+69911.50/s (n=10000000)
     match: 22 wallclock secs (21.12 usr +  0.00 sys = 21.12 CPU) @ 47
+3484.85/s (n=10000000)
    substr: 12 wallclock secs (11.88 usr +  0.00 sys = 11.88 CPU) @ 84
+1750.84/s (n=10000000)
    unpack:  8 wallclock secs ( 9.04 usr +  0.00 sys =  9.04 CPU) @ 11
+06194.69/s (n=10000000)
[download]

By the way: This question was crossposted to StackOverflow here.

Dave

In reply to Re: Fast Way to Split String in to Chunk of Equal Length by davido
in thread Fast Way to Split String in to Chunk of Equal Length by neversaint

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


P is for Practical
	PerlMonks