in reply to Re: Performance problems on splitting long strings
in thread Performance problems on splitting long strings

Thank you. I know how to use the benchmark module, no problem with that, what I am looking for is some other ideas on how to split my data more efficiently, in order to benchmark these ideas.
  • Comment on Re^2: Performance problems on splitting long strings

Replies are listed 'Best First'.
Re^3: Performance problems on splitting long strings
by hdb (Monsignor) on Jan 30, 2014 at 20:12 UTC

    Why then did you not bother to write a few lines like:

    use strict; use warnings; use Benchmark 'cmpthese'; my $string = map { ('a'..'z')[rand 26] } 1..30; my @sub_fields; cmpthese( -1, { regex1 => sub { @sub_fields = $string =~ /\w{5}/g }, regex2 => sub { @sub_fields = $string =~ /.{5}/g }, unpack => sub { @sub_fields = unpack '(A4)*', $string }, substr => sub { @sub_fields = map { substr $string, 5*$_, 5 + } 0..length( $string )/5-1 }, });

    that already shows that the regex idea is vastly inferior:

    Rate substr unpack regex1 regex2 substr 696486/s -- -57% -94% -94% unpack 1603093/s 130% -- -85% -86% regex1 10731041/s 1441% 569% -- -4% regex2 11165392/s 1503% 596% 4% --
      The $string variable contains '30'. I think you meant

      my $string = join '',map { ('a'..'z')[rand 26] }1..30;

      With this correction, unpack is faster. :-)

      Rate regex1 regex2 substr unpack regex1 225055/s -- -1% -4% -53% regex2 228189/s 1% -- -3% -53% substr 235177/s 4% 3% -- -51% unpack 481548/s 114% 111% 105% --

        Thanks a lot! Teaches me a well-deserved lesson...

      Probably of minor importance to your benchmark, but your unpack template should be:

      unpack '(A5)*', $string    # Not '(A4)*'

      Why then did you not bother to write a few lines like:...

      Thank you for your answer, hdb, I think I said quite clearly in the original post that I intended to do a benchmark and that I was really looking for some ideas on possibly more efficient ways of doing the splitting, in order to benchmark them along with the ideas I explained. Possibly a Perl function unknown to me, or a use that I did not think about of a function known to me, or a module that I don't know about, whatever. As for the unpack function, I have used it about 5 times in the last 10 years and I had forgotten about the '*' option and I missed it when I looked at the documentation (which, in my humble opinion, could be clearer). Lacking that option, working my way around it was possible but would have made the benchmark less significant because of the added penalty due to this workaround.

      I will benchmark all the options that have proposed here and publish the results later on this post.

        ... unpack ... documentation ... could be clearer ...

        You've probably seen this already, but take a look at perlpacktut, esp. Template Grouping.

        Laurent_R,

        please don't take my teasing too seriously. Last night when I was looking for a little challenge on PM I was annoyed that I could not just paste ideas into a given benchmarking script but had to write it myself and code your detailed verbal descriptions of alternatives. (And then found that I made a good number of mistakes when doing so in anger...)

        So I thought it was funny to reply to the post of a senior monk with one of those "please read this before posting" comments.

        Looking forward to your conclusions from the Benchmarking.

        hdb