in reply to Performance problems on splitting long strings

What have you tried? A good point to start is to look at the documentation of Benchmark.

  • Comment on Re: Performance problems on splitting long strings

Replies are listed 'Best First'.
Re^2: Performance problems on splitting long strings
by Laurent_R (Canon) on Jan 30, 2014 at 19:40 UTC
    Thank you. I know how to use the benchmark module, no problem with that, what I am looking for is some other ideas on how to split my data more efficiently, in order to benchmark these ideas.

      Why then did you not bother to write a few lines like:

      use strict; use warnings; use Benchmark 'cmpthese'; my $string = map { ('a'..'z')[rand 26] } 1..30; my @sub_fields; cmpthese( -1, { regex1 => sub { @sub_fields = $string =~ /\w{5}/g }, regex2 => sub { @sub_fields = $string =~ /.{5}/g }, unpack => sub { @sub_fields = unpack '(A4)*', $string }, substr => sub { @sub_fields = map { substr $string, 5*$_, 5 + } 0..length( $string )/5-1 }, });

      that already shows that the regex idea is vastly inferior:

      Rate substr unpack regex1 regex2 substr 696486/s -- -57% -94% -94% unpack 1603093/s 130% -- -85% -86% regex1 10731041/s 1441% 569% -- -4% regex2 11165392/s 1503% 596% 4% --
        The $string variable contains '30'. I think you meant

        my $string = join '',map { ('a'..'z')[rand 26] }1..30;

        With this correction, unpack is faster. :-)

        Rate regex1 regex2 substr unpack regex1 225055/s -- -1% -4% -53% regex2 228189/s 1% -- -3% -53% substr 235177/s 4% 3% -- -51% unpack 481548/s 114% 111% 105% --

        Probably of minor importance to your benchmark, but your unpack template should be:

        unpack '(A5)*', $string    # Not '(A4)*'

        Why then did you not bother to write a few lines like:...

        Thank you for your answer, hdb, I think I said quite clearly in the original post that I intended to do a benchmark and that I was really looking for some ideas on possibly more efficient ways of doing the splitting, in order to benchmark them along with the ideas I explained. Possibly a Perl function unknown to me, or a use that I did not think about of a function known to me, or a module that I don't know about, whatever. As for the unpack function, I have used it about 5 times in the last 10 years and I had forgotten about the '*' option and I missed it when I looked at the documentation (which, in my humble opinion, could be clearer). Lacking that option, working my way around it was possible but would have made the benchmark less significant because of the added penalty due to this workaround.

        I will benchmark all the options that have proposed here and publish the results later on this post.