Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: Performance problems on splitting long strings

by hdb (Monsignor)
on Jan 30, 2014 at 20:12 UTC ( [id://1072725]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Performance problems on splitting long strings
in thread Performance problems on splitting long strings

Why then did you not bother to write a few lines like:

use strict; use warnings; use Benchmark 'cmpthese'; my $string = map { ('a'..'z')[rand 26] } 1..30; my @sub_fields; cmpthese( -1, { regex1 => sub { @sub_fields = $string =~ /\w{5}/g }, regex2 => sub { @sub_fields = $string =~ /.{5}/g }, unpack => sub { @sub_fields = unpack '(A4)*', $string }, substr => sub { @sub_fields = map { substr $string, 5*$_, 5 + } 0..length( $string )/5-1 }, });

that already shows that the regex idea is vastly inferior:

Rate substr unpack regex1 regex2 substr 696486/s -- -57% -94% -94% unpack 1603093/s 130% -- -85% -86% regex1 10731041/s 1441% 569% -- -4% regex2 11165392/s 1503% 596% 4% --

Replies are listed 'Best First'.
Re^4: Performance problems on splitting long strings
by Cristoforo (Curate) on Jan 30, 2014 at 20:42 UTC
    The $string variable contains '30'. I think you meant

    my $string = join '',map { ('a'..'z')[rand 26] }1..30;

    With this correction, unpack is faster. :-)

    Rate regex1 regex2 substr unpack regex1 225055/s -- -1% -4% -53% regex2 228189/s 1% -- -3% -53% substr 235177/s 4% 3% -- -51% unpack 481548/s 114% 111% 105% --

      Thanks a lot! Teaches me a well-deserved lesson...

Re^4: Performance problems on splitting long strings
by Not_a_Number (Prior) on Jan 30, 2014 at 20:57 UTC

    Probably of minor importance to your benchmark, but your unpack template should be:

    unpack '(A5)*', $string    # Not '(A4)*'
Re^4: Performance problems on splitting long strings
by Laurent_R (Canon) on Jan 30, 2014 at 22:35 UTC

    Why then did you not bother to write a few lines like:...

    Thank you for your answer, hdb, I think I said quite clearly in the original post that I intended to do a benchmark and that I was really looking for some ideas on possibly more efficient ways of doing the splitting, in order to benchmark them along with the ideas I explained. Possibly a Perl function unknown to me, or a use that I did not think about of a function known to me, or a module that I don't know about, whatever. As for the unpack function, I have used it about 5 times in the last 10 years and I had forgotten about the '*' option and I missed it when I looked at the documentation (which, in my humble opinion, could be clearer). Lacking that option, working my way around it was possible but would have made the benchmark less significant because of the added penalty due to this workaround.

    I will benchmark all the options that have proposed here and publish the results later on this post.

      Laurent_R,

      please don't take my teasing too seriously. Last night when I was looking for a little challenge on PM I was annoyed that I could not just paste ideas into a given benchmarking script but had to write it myself and code your detailed verbal descriptions of alternatives. (And then found that I made a good number of mistakes when doing so in anger...)

      So I thought it was funny to reply to the post of a senior monk with one of those "please read this before posting" comments.

      Looking forward to your conclusions from the Benchmarking.

      hdb

        Hi hdb,

        I have no problem with your previous post (which I upvoted, BTW). I appreciate your point that it would have been better if I had posted the beginning of a benchmark, I am sorry about that, but I had not yet prepared the benchmark when I posted the original post, simply because I was waiting for the ideas offered by various monks to prepare it.

        The results of the benchmark, along with the code, have now been posted.

      ... unpack ... documentation ... could be clearer ...

      You've probably seen this already, but take a look at perlpacktut, esp. Template Grouping.

        Thank you very much, AnomalousMonk, I looked at it and I am pretty sure that I had seen this document before, but I had completely forgotten about it. I have bookmarked it now, hopefully I'll find it next time I need such documentation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1072725]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (6)
As of 2024-03-28 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found