Re^2: Performance problems on splitting long strings

Replies are listed 'Best First'.
Re^3: Performance problems on splitting long strings by hdb (Monsignor) on Jan 30, 2014 at 20:12 UTC
Why then did you not bother to write a few lines like: `use strict; use warnings; use Benchmark 'cmpthese'; my $string = map { ('a'..'z')[rand 26] } 1..30; my @sub_fields; cmpthese( -1, { regex1 => sub { @sub_fields = $string =~ /\w{5}/g }, regex2 => sub { @sub_fields = $string =~ /.{5}/g }, unpack => sub { @sub_fields = unpack '(A4)', $string }, substr => sub { @sub_fields = map { substr $string, 5$_, 5 + } 0..length( $string )/5-1 }, });` [download] that already shows that the regex idea is vastly inferior: `Rate substr unpack regex1 regex2 substr 696486/s -- -57% -94% -94% unpack 1603093/s 130% -- -85% -86% regex1 10731041/s 1441% 569% -- -4% regex2 11165392/s 1503% 596% 4% --` [download]	[reply] [d/l] [select]
Re^4: Performance problems on splitting long strings by Cristoforo (Curate) on Jan 30, 2014 at 20:42 UTC
The $string variable contains '30'. I think you meant `my $string = join '',map { ('a'..'z')[rand 26] }1..30;` With this correction, unpack is faster. :-) `Rate regex1 regex2 substr unpack regex1 225055/s -- -1% -4% -53% regex2 228189/s 1% -- -3% -53% substr 235177/s 4% 3% -- -51% unpack 481548/s 114% 111% 105% --` [download]	[reply] [d/l] [select]
Re^5: Performance problems on splitting long strings by hdb (Monsignor) on Jan 30, 2014 at 20:53 UTC
Thanks a lot! Teaches me a well-deserved lesson...	[reply]
Re^4: Performance problems on splitting long strings by Not_a_Number (Prior) on Jan 30, 2014 at 20:57 UTC
Probably of minor importance to your benchmark, but your `unpack` template should be: `unpack '(A5)', $string # Not '(A4)'`	[reply] [d/l] [select]
Re^4: Performance problems on splitting long strings by Laurent_R (Canon) on Jan 30, 2014 at 22:35 UTC
Why then did you not bother to write a few lines like:... Thank you for your answer, hdb, I think I said quite clearly in the original post that I intended to do a benchmark and that I was really looking for some ideas on possibly more efficient ways of doing the splitting, in order to benchmark them along with the ideas I explained. Possibly a Perl function unknown to me, or a use that I did not think about of a function known to me, or a module that I don't know about, whatever. As for the `unpack` function, I have used it about 5 times in the last 10 years and I had forgotten about the '*' option and I missed it when I looked at the documentation (which, in my humble opinion, could be clearer). Lacking that option, working my way around it was possible but would have made the benchmark less significant because of the added penalty due to this workaround. I will benchmark all the options that have proposed here and publish the results later on this post.	[reply] [d/l]
Re^5: Performance problems on splitting long strings by AnomalousMonk (Archbishop) on Jan 31, 2014 at 22:57 UTC
... `unpack` ... documentation ... could be clearer ... You've probably seen this already, but take a look at perlpacktut, esp. Template Grouping.	[reply] [d/l]
Re^6: Performance problems on splitting long strings by Laurent_R (Canon) on Feb 01, 2014 at 00:55 UTC
Re^5: Performance problems on splitting long strings by hdb (Monsignor) on Jan 31, 2014 at 07:33 UTC
Laurent_R, please don't take my teasing too seriously. Last night when I was looking for a little challenge on PM I was annoyed that I could not just paste ideas into a given benchmarking script but had to write it myself and code your detailed verbal descriptions of alternatives. (And then found that I made a good number of mistakes when doing so in anger...) So I thought it was funny to reply to the post of a senior monk with one of those "please read this before posting" comments. Looking forward to your conclusions from the Benchmarking. hdb	[reply]
Re^6: Performance problems on splitting long strings by Laurent_R (Canon) on Jan 31, 2014 at 18:41 UTC