Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The following is the post I prepared yesterday evening (after 2 a.m. this morning in fact), but I guess I was too tired: I previewed it, but stupidly forgot to hit the create button.

OK, now i ran a benchmark with the various possibilities. The results might be useful to others.

I tried 8 different solutions: a "C_style" solution (converting the string into an array of individual characters), two regexes (one with /\w{5}/ and one with /.{5}/, one with the opening of a file handler on a reference to the string, two variations on a loop with the substr function, the split solution offered by Kenosis (although I probably would not be able to use it with the old version of Perl that we have on our servers, but I could test it at home on my more recent version) and unpack.

The following is the code (borrowed in part from Kenosis):

use strict; use warnings; use Benchmark qw/cmpthese/; my $string = (join '', 'a'..'z') x 10; my $unpack = sub { my @sub_fields = unpack '(A5)*', $string; }; my $regex1 = sub { my @sub_fields = $string =~ /\w{5}/g; }; my $regex2 = sub { my @sub_fields = $string =~ /.{5}/g; }; my $split = sub { # suggested by Kenosis my @sub_fields = split /.{5}\K/, $string; }; my $substr1 = sub { my @sub_fields; for ( my $i = 0 ; $i < length $string ; $i += 5 ) { push @sub_fields, substr $string, $i, 5; } }; my $substr2 = sub { my @sub_fields; my $max = (length $string)/5 -1; push @sub_fields, substr $string, $_*5, 5 for (0..$max); }; my $filehandle = sub { my (@sub_fields, $var); open my $FH, "<", \$string or die "cannot open $string $!"; push @sub_fields, $var while read $FH, $var, 5; }; my $c_style_string = sub { # the idea suggested by boftx my @sub_fields; my @chars = split //, $string; while (@chars) { push @sub_fields, join '', splice (@chars, 0, 5)} +; }; cmpthese( -1, { regex1 => sub {$regex1->()}, regex2 => sub {$regex2->()}, unpack => sub {$unpack->()}, split => sub {$split->()}, substr1 => sub { $substr1->()}, substr2 => sub {$substr2->()}, FH => sub {$filehandle->()}, C_Style => sub { $c_style_string->()} } )
And these are the results:
Rate C_Style regex1 regex2 FH split substr1 substr +2 unpack C_Style 5598/s -- -72% -72% -78% -79% -80% -80 +% -83% regex1 19690/s 252% -- -1% -24% -28% -28% -31 +% -39% regex2 19968/s 257% 1% -- -23% -26% -27% -30 +% -38% FH 25984/s 364% 32% 30% -- -4% -5% -9 +% -20% split 27161/s 385% 38% 36% 5% -- -1% -5 +% -16% substr1 27355/s 389% 39% 37% 5% 1% -- -4 +% -16% substr2 28523/s 409% 45% 43% 10% 5% 4% - +- -12% unpack 32428/s 479% 65% 62% 25% 19% 19% 14 +% --
So unpack wins clearly the race, but I was surprised to see that substr is not that far behind.

Update this evening (Jan 31, 2014 at 18:45): I incorporated the unpack solution in my program at work today, and the speed gain I obtained on my real data is significantly better than what could be derived from the figures of the benchmark above. The profiling shows that the modified code line runs surprisingly almost twice faster than the original one.


In reply to Re: Performance problems on splitting long strings by Laurent_R
in thread Performance problems on splitting long strings by Laurent_R

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found