Are you sure that chunking the strings is where your performance bottleneck is? Have you profiled? Could IO be a more significant constraint to execution time? I only ask because it doesn't seem like the performance of substr for 10,000,000 strings is all that bad for the problem domain.
If this section of code is really significant, here's a comparison of the valid solutions provided up to this point in the thread. Naturally unpack wins. It's only a few seconds slower in 10,000,000 iterations than the "control" case (which isn't a solution, but just a check to see what the framework for each solution costs).
use strict;
use warnings;
use Benchmark qw/timethese/;
# Test/Benchmark parameters.
$main::string = q/CTTCGAATT/;
our $time = 10000000;
my $subs_to_test = {
substr => \&by_substr,
match => \&by_match,
unpack => \&by_unpack,
control => \&control,
};
# Benchmark.
timethese( $time, $subs_to_test );
# Subs being benchmarked.
sub control {
my @substrings;
@substrings = qw/CTT CGA ATT/;
return \@substrings;
}
sub by_substr {
my $position = 0;
my @substrings;
while( $position < length $main::string ) {
push @substrings, substr( $main::string, $position, 3 );
$position += 3;
}
return \@substrings;
}
sub by_match {
my @substrings;
while( $main::string =~ m/(...)/sg ) {
push @substrings, $1;
}
return \@substrings;
}
sub by_unpack {
my @substrings;
@substrings = unpack( '(a3)*', $main::string );
return \@substrings;
}
Here's the output.
Benchmark: timing 10000000 iterations of control, match, substr, unpac
+k...
control: 5 wallclock secs ( 5.65 usr + 0.00 sys = 5.65 CPU) @ 17
+69911.50/s (n=10000000)
match: 22 wallclock secs (21.12 usr + 0.00 sys = 21.12 CPU) @ 47
+3484.85/s (n=10000000)
substr: 12 wallclock secs (11.88 usr + 0.00 sys = 11.88 CPU) @ 84
+1750.84/s (n=10000000)
unpack: 8 wallclock secs ( 9.04 usr + 0.00 sys = 9.04 CPU) @ 11
+06194.69/s (n=10000000)
By the way: This question was crossposted to StackOverflow here.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.