Speed Improvement

Nar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Speed Improvement (Updated!) by BrowserUk (Patriarch) on Dec 01, 2014 at 20:28 UTC
Try this (updated to reflect the error toolic caught below)(Further updated to correct flaws reported by dualafn): sub substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s };; print substitute( $_ ) for 'A message {d3} of {d4}';; A message 425 of 3404 print substitute( $_ ) for 'A message {d3} of {d4}';; A message 860 of 3797 print substitute( $_ ) for 'A message {d3} of {d4}';; A message 453 of 2262 print substitute( $_ ) for 'A message {d3} of {d4}';; A message 761 of 5898 print substitute( $_ ) for 'A message {d3} of {d4}';; A message 321 of 4182 print substitute( $_ ) for 'A message {d3} of {d4}';; A message 924 of 5054 [download] `[14:37:57.99] C:\test>junk Rate nar_func mca_func toolic_func buk_func nar_func 2449/s -- -26% -66% -78% mca_func 3287/s 34% -- -55% -70% toolic_func 7231/s 195% 120% -- -35% buk_func 11125/s 354% 238% 54% --` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Speed Improvement by toolic (Bishop) on Dec 01, 2014 at 20:44 UTC
I think it needs a small change to match the `\d`, but yours is still fastest: `# vv $message =~ s[\{\\d(\d+)\}][ substr int( rand 1e9 ), 0, $1 ]ge; Rate mca_func toolic_func buk_func mca_func 41051/s -- -36% -56% toolic_func 64558/s 57% -- -30% buk_func 92678/s 126% 44% --` [download]	[reply] [d/l] [select]
Re^3: Speed Improvement by BrowserUk (Patriarch) on Dec 01, 2014 at 21:01 UTC
it needs a small change to match the \d, Ah yes. That does lessen the impact somewhat doesn't it ;) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Speed Improvement (Updated!) by duelafn (Parson) on Dec 02, 2014 at 14:06 UTC
Though, OP should note that mca and toolic functions will always produce a string of exactly the requested digits (possibly zero padded). The buk function can (very rarely) produce replacements which are shorter than requested (does not zero-pad). Also, buk's function will favor replacements which start with a non-zero digit (isn't uniform). Good Day, Dean	[reply]
Re^3: Speed Improvement (Updated!) by BrowserUk (Patriarch) on Dec 02, 2014 at 14:34 UTC
The buk function can (very rarely) produce replacements which are shorter than requested (does not zero-pad). Also, buk's function will favor replacements which start with a non-zero digit (isn't uniform). Good catches both. I believe that this version corrects both problems without unduly impacting its performance: `sub substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s }` [download] Examples: A message 031 of 9150 A message 963 of 0677 A message 477 of 1689 A message 565 of 7675 A message 074 of 6471 A message 235 of 6949 A message 724 of 6703 A message 445 of 6304 A message 991 of 3243 A message 100 of 8825 A message 395 of 2102 A message 406 of 2886 A message 534 of 7045 A message 225 of 8037 A message 443 of 5097 A message 539 of 7089 A message 090 of 7777 A message 208 of 9697 A message 432 of 5864 A message 668 of 7766 A message 913 of 0511 A message 718 of 7972 A message 834 of 8521 A message 064 of 1745 [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. Science is about questioning the status quo. Questioning authority	[reply] [d/l] [select]
Re: Speed Improvement by toolic (Bishop) on Dec 01, 2014 at 19:15 UTC
Possibly simpler code, and definitely fewer calls to rand. You can Benchmark: `use warnings; use strict; my $out = substitute('This is a message! {\d3} --- {\d2} --- {\d3}'); print "$out\n"; exit; sub substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge +; return $message; } __END__ This is a message! 442 --- 86 --- 885` [download]	[reply] [d/l]
Re^2: Speed Improvement by McA (Priest) on Dec 01, 2014 at 19:40 UTC
The rule of thumb seems to fit here: The less code the faster is it: #!/usr/bin/perl use strict; use warnings; use 5.010; use Benchmark qw(cmpthese); my @messages = <DATA>; cmpthese(1000000, { 'mca_func' => sub { my $out = mca_substitute($_) foreach (@messages); }, 'toolic_func' => sub { my $out = toolic_substitute($_) foreach (@messages); }, }); sub toolic_substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge +; return $message; } sub mca_substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/myreplace($1)/ge; return $message; } sub myreplace { return '' unless $_[0]; my $string = ''; $string .= int(rand 10) for (1..$_[0]); return $string; } __DATA__ This is a message! {\d3} --- {\d2} Another {\d3} Message with {\d5} A {\d1} little {\d6} longer string {\d3} [download] On my machine: `Rate mca_func toolic_func mca_func 42955/s -- -35% toolic_func 66534/s 55% --` [download] McA	[reply] [d/l] [select]
Re: Speed Improvement by McA (Priest) on Dec 01, 2014 at 19:11 UTC
As a starting point: `sub mca_substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/myreplace($1)/ge; return $message; } sub myreplace { return '' unless $_[0]; my $string = ''; $string .= int(rand 10) for (1..$_[0]); return $string; }` [download] You didn't present a runable example, so I can't do a benchmark. So, I just hope to have a faster version. McA	[reply] [d/l]
Re: Speed Improvement by AnomalousMonk (Archbishop) on Dec 01, 2014 at 22:13 UTC
A small variation on some other approaches, and has the drawback of needing state (introduced with 5.10), and I'm not going to benchmark it, but... `c:\@Work\Perl\monks>perl -wMstrict -le "use feature 'state'; ;; my $s = '1 {\d1} 2 {\d2} xx{\d3}yy'; print qq{'$s'}; ;; $s =~ s{ \{ \\d (\d+) \} }{ ran_digs($1) }xmsge; print qq{'$s'}; ;; sub ran_digs { state $powers = { map { $_ => 10 ** $_ } 1 .. 7 }; ;; return int rand $powers->{$_[0]}; } " '1 {\d1} 2 {\d2} xx{\d3}yy' '1 7 2 18 xx406yy'` [download]	[reply] [d/l]
Re: Speed Improvement by Perlbotics (Archbishop) on Dec 01, 2014 at 23:41 UTC
Naive and brutal, but works for me ;-) use strict; use warnings; use Inline 'C'; use constant ZERO_START => 0; # digit-sequence allows 0 start use constant NON_ZERO_START => 1; # digit-sequence must not start wi +th a leading 0 msg_subst_seed((time() + $$) % 32000); # just for tests, not a good id +ea where security matters do { my $x = msg_subst("3x: ({\\d4})x({\\d1})x({\\d10})", NON_ZERO_START) +; print "[$x]\n"; }; print msg_subst("empty: ({\\d0})\n", ZERO_START); print msg_subst("$_-dig: ({\\d$_})\n", ZERO_START) for (1..10); print msg_subst("2x : ({\\d3}) x ({\\d3})\n", NON_ZERO_START); print msg_subst("huge: ({\\d99})\n", ZERO_START); print msg_subst("err: ({\\d100})\n", ZERO_START); print msg_subst("err: ({\\d-10})\n", ZERO_START); print msg_subst('overflow: {\d99}{\d99}{\d99}{\d99}{\d99}{\d99}{\d99}{ +\d99}{\d99}{\d99}{\d99}'."\n", ZERO_START); __END__ __C__ #include <stdint.h> /* see: http://stackoverflow.com/questions/1167253/implementation-of +-rand / static uint32_t z1 = 12345, z2 = 12345, z3 = 12345, z4 = 12345; / s +eed / void msg_subst_seed(int seed) { / TODO: needs improvement / if (seed < 0) seed = -seed; if (!seed) seed = 1; z1 = seed; / z1: positive and >= 1 / } uint32_t lfsr113_Bits (void) { uint32_t b; b = ((z1 << 6) ^ z1) >> 13; z1 = ((z1 & 4294967294U) << 18) ^ b; b = ((z2 << 2) ^ z2) >> 27; z2 = ((z2 & 4294967288U) << 2) ^ b; b = ((z3 << 13) ^ z3) >> 21; z3 = ((z3 & 4294967280U) << 7) ^ b; b = ((z4 << 3) ^ z4) >> 12; z4 = ((z4 & 4294967168U) << 13) ^ b; return (z1 ^ z2 ^ z3 ^ z4); } / see.. until here / char get_digit() {/ iterator to get an ASCII digit in range '0'..' +9', the naive way... / static int count = 0; static uint32_t drawn_rand; char digit; if (!count) { drawn_rand = lfsr113_Bits(); count = 8; } digit = (char) ('0' + drawn_rand % 10); drawn_rand /= 10; count--; return digit; } SV msg_subst(char* msg, int mode) { static char buf[1024]; /* keep the result smaller than this, increase the static b +uffer, or use malloc() / char to = buf; char from = msg; while(from && (to-buf) <= 1000) { /* for simplicity, we allow {\d0} .. {\d99} only / if (from == '{' && from[1] == '\\' && from[2] == 'd' && isdigi +t( from[3] )) { int i; int count = from[3] - '0'; int consumed = 4; if ( isdigit(from[4]) ) { count = count * 10 + from[4] - '0'; consumed++; } if ( from[consumed] == '}' ) { /* found a valid {\d##} sequenc +e / if (count && mode == 1) { count--; to = get_digit(); while (to == '0') { to = get_digit(); } /* no leading ze +ros wanted / to++; } for (i=0; i<count && (to-buf) <= 1000 ; i++) to++ = get_d +igit(); from += consumed; } else { /* err, just copy what we saw / for (i=0; i<consumed; i++) to++ = from[i]; from += consumed -1; } } else { /* no {\d++} sequence in sight / to++ = from; } from++; } to = '\0'; if ( (to-buf) >= 1000 ) strcpy( to, "... OVERFLOW!\n"); return newSVpv(buf, strlen(buf)); } /* further improvements: - profile / this alternative "can" be faster - other/better random generator or lookup-table () - parse pattern once (if pattern stays constant for a sufficiently l +ong period of time) - more than 99 digits - Modes: HEX, ASCII, SET(...), RANGE(...) - tweak "use inline C;" - pre-compute messages if possible - return a list of results - find more buffer overflows ;-) - find mem-leaks? - set $! - ... / [download] Sample: [3x: (5800)x(9)x(4119272153)] empty: () 1-dig: (6) 2-dig: (00) 3-dig: (286) 4-dig: (3538) 5-dig: (76912) 6-dig: (152922) 7-dig: (1998601) 8-dig: (46526790) 9-dig: (272269142) 10-dig: (5036180257) 2x : (270) x (295) huge: (60698211971351646325662438145076882977674264083631891784014981 +6984333515486911733872823592611980610) err: ({\d100}) err: ({\d-10}) overflow: 388337913014518572079085655682352923685268720200399379872225 +309937696652401652578874749078259328418468960924412471336333591725866 +610726754040840249257924549575536281471052995818025767742622376415863 +863933450041956239915916349295988704367621957622747170445376882427886 +836368075301197163788171910212466615189214921253457526440240130287642 +877297593187189458587800219270169705119636312020472283690712168123351 +112899925547204701597189413213272345551314154916747140042506436194250 +360536730048179486951946798572111575588675513637404504421938424450645 +872242712239584218866179138498996988181042546504722655259441150264820 +656960415909356582097541742743117114058581773895455908059546695344897 +361883153435808150852670180088922568961467160025338460179243921987976 +441004315513268035989754880873846622730659638610226282538414297658365 +180965342365181518413709147137791131035270552146178096818064703332869 +696209005692384041910395288531887465999649751307270627069753831067510 +2369072243267355810846576590914699... OVERFLOW*! [download]	[reply] [d/l] [select]
Re: Speed Improvement by GotToBTru (Prior) on Dec 01, 2014 at 19:28 UTC
UPDATE - turns out study is a no-op since 5.16.0. Not sure why this is not noted in the documentation. Perhaps because this was always a try-it-and-see kind of thing. So, if you're on versions earlier than that, it might help. In any event, it would not help in this particular application. See if study might help. It is intended to speed searches within a string. 1 Peter 4:10	[reply]
Re^2: Speed Improvement by dave_the_m (Monsignor) on Dec 02, 2014 at 11:24 UTC
Note that study() has been a no-op since 5.16.0 Dave.	[reply]
Re^3: Speed Improvement by GotToBTru (Prior) on Dec 02, 2014 at 15:20 UTC
UPDATE: typo inflates the results here, `study` does not produce the improvement shown here. But it does do something, not nothing, so I'm pretty sure it's not a no-op. `C:\perl>perl -v This is perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x +86-multi-thread-64int` [download] #!/usr/bin/perl use strict; use warnings; use 5.010; use Benchmark qw(cmpthese); my @messages = <DATA>; cmpthese(1000000, { 'mca_func' => sub { my $out = mca_substitute($_) foreach (@messages); }, 'toolic_func' => sub { my $out = toolic_substitute($_) foreach (@messages); }, 'study_func' => sub { my $out = study_substitute($_) for each (@messages); }, }); sub toolic_substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10$1))/ge +; return $message; } sub study_substitute { my $message = shift; study $message; $message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10$1))/ge +; return $message; } sub mca_substitute { my $message = shift; $message =~ s/\{\\d(\d+)\}/myreplace($1)/ge; return $message; } sub myreplace { return '' unless $_[0]; my $string = ''; $string .= int(rand 10) for (1..$_[0]); return $string; } __DATA__ This is a message! {\d3} --- {\d2} Another {\d3} Message with {\d5} A {\d1} little {\d6} longer string {\d3} [download] `C:\perl>perl testpm.pl Rate mca_func toolic_func study_func mca_func 34744/s -- -32% -78% toolic_func 50795/s 46% -- -68% study_func 157505/s 353% 210% -- C:\perl>` [download] Not bad for a no-op. 1 Peter 4:10	[reply] [d/l] [select]
Re^4: Speed Improvement by BrowserUk (Patriarch) on Dec 02, 2014 at 15:37 UTC
Re^5: Speed Improvement by GotToBTru (Prior) on Dec 02, 2014 at 16:01 UTC
Some notes below your chosen depth have not been shown here
Re^4: Speed Improvement by McA (Priest) on Dec 02, 2014 at 15:39 UTC
Re: Speed Improvement by Laurent_R (Canon) on Dec 01, 2014 at 19:54 UTC
The first thing to do would be to profile your code, to figure out what takes at lot of time. Try Devel::NYTProf, probably the best profiler available under Perl.	[reply]
Re: Speed Improvement by GrandFather (Saint) on Dec 01, 2014 at 20:19 UTC
Given McA's benchmark indicates you can expect to process at least 20,000 of these a second what are you doing where that is a limiting factor? Maybe you should tell us a little more of what you are doing in case we can identify larger bottle necks? Even better would be to profile your code and see for yourself where the issues are. Perl is the programming world's equivalent of English	[reply]