Re: Speed Improvement (Updated!)
by BrowserUk (Patriarch) on Dec 01, 2014 at 20:28 UTC
|
Try this (updated to reflect the error toolic caught below)(Further updated to correct flaws reported by dualafn):
sub substitute{
my $s = shift;
$s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge;
return $s
};;
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 425 of 3404
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 860 of 3797
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 453 of 2262
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 761 of 5898
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 321 of 4182
print substitute( $_ ) for 'A message {d3} of {d4}';;
A message 924 of 5054
[14:37:57.99] C:\test>junk
Rate nar_func mca_func toolic_func buk_func
nar_func 2449/s -- -26% -66% -78%
mca_func 3287/s 34% -- -55% -70%
toolic_func 7231/s 195% 120% -- -35%
buk_func 11125/s 354% 238% 54% --
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
I think it needs a small change to match the \d, but yours is still fastest:
# vv
$message =~ s[\{\\d(\d+)\}][ substr int( rand 1e9 ), 0, $1 ]ge;
Rate mca_func toolic_func buk_func
mca_func 41051/s -- -36% -56%
toolic_func 64558/s 57% -- -30%
buk_func 92678/s 126% 44% --
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
Though, OP should note that mca and toolic functions will always produce a string of exactly the requested digits (possibly zero padded). The buk function can (very rarely) produce replacements which are shorter than requested (does not zero-pad). Also, buk's function will favor replacements which start with a non-zero digit (isn't uniform).
| [reply] |
|
|
sub substitute{
my $s = shift;
$s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge;
return $s
}
Examples: A message 031 of 9150
A message 963 of 0677
A message 477 of 1689
A message 565 of 7675
A message 074 of 6471
A message 235 of 6949
A message 724 of 6703
A message 445 of 6304
A message 991 of 3243
A message 100 of 8825
A message 395 of 2102
A message 406 of 2886
A message 534 of 7045
A message 225 of 8037
A message 443 of 5097
A message 539 of 7089
A message 090 of 7777
A message 208 of 9697
A message 432 of 5864
A message 668 of 7766
A message 913 of 0511
A message 718 of 7972
A message 834 of 8521
A message 064 of 1745
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Science is about questioning the status quo. Questioning authority
| [reply] [d/l] [select] |
Re: Speed Improvement
by toolic (Bishop) on Dec 01, 2014 at 19:15 UTC
|
Possibly simpler code, and definitely fewer calls to rand. You can Benchmark:
use warnings;
use strict;
my $out = substitute('This is a message! {\d3} --- {\d2} --- {\d3}');
print "$out\n";
exit;
sub substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge
+;
return $message;
}
__END__
This is a message! 442 --- 86 --- 885
| [reply] [d/l] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Benchmark qw(cmpthese);
my @messages = <DATA>;
cmpthese(1000000, {
'mca_func' => sub {
my $out = mca_substitute($_) foreach (@messages);
},
'toolic_func' => sub {
my $out = toolic_substitute($_) foreach (@messages);
},
});
sub toolic_substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge
+;
return $message;
}
sub mca_substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/myreplace($1)/ge;
return $message;
}
sub myreplace {
return '' unless $_[0];
my $string = '';
$string .= int(rand 10) for (1..$_[0]);
return $string;
}
__DATA__
This is a message! {\d3} --- {\d2}
Another {\d3} Message with {\d5}
A {\d1} little {\d6} longer string {\d3}
On my machine:
Rate mca_func toolic_func
mca_func 42955/s -- -35%
toolic_func 66534/s 55% --
McA | [reply] [d/l] [select] |
Re: Speed Improvement
by McA (Priest) on Dec 01, 2014 at 19:11 UTC
|
sub mca_substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/myreplace($1)/ge;
return $message;
}
sub myreplace {
return '' unless $_[0];
my $string = '';
$string .= int(rand 10) for (1..$_[0]);
return $string;
}
You didn't present a runable example, so I can't do a benchmark. So, I just hope to have a faster version.
McA | [reply] [d/l] |
Re: Speed Improvement
by AnomalousMonk (Archbishop) on Dec 01, 2014 at 22:13 UTC
|
A small variation on some other approaches, and has the drawback of needing state (introduced with 5.10), and I'm not going to benchmark it, but...
c:\@Work\Perl\monks>perl -wMstrict -le
"use feature 'state';
;;
my $s = '1 {\d1} 2 {\d2} xx{\d3}yy';
print qq{'$s'};
;;
$s =~ s{ \{ \\d (\d+) \} }{ ran_digs($1) }xmsge;
print qq{'$s'};
;;
sub ran_digs {
state $powers = { map { $_ => 10 ** $_ } 1 .. 7 };
;;
return int rand $powers->{$_[0]};
}
"
'1 {\d1} 2 {\d2} xx{\d3}yy'
'1 7 2 18 xx406yy'
| [reply] [d/l] |
Re: Speed Improvement
by Perlbotics (Archbishop) on Dec 01, 2014 at 23:41 UTC
|
Naive and brutal, but works for me ;-)
use strict;
use warnings;
use Inline 'C';
use constant ZERO_START => 0; # digit-sequence allows 0 start
use constant NON_ZERO_START => 1; # digit-sequence must not start wi
+th a leading 0
msg_subst_seed((time() + $$) % 32000); # just for tests, not a good id
+ea where security matters
do {
my $x = msg_subst("3x: ({\\d4})x({\\d1})x({\\d10})", NON_ZERO_START)
+;
print "[$x]\n";
};
print msg_subst("empty: ({\\d0})\n", ZERO_START);
print msg_subst("$_-dig: ({\\d$_})\n", ZERO_START) for (1..10);
print msg_subst("2x : ({\\d3}) x ({\\d3})\n", NON_ZERO_START);
print msg_subst("huge: ({\\d99})\n", ZERO_START);
print msg_subst("err: ({\\d100})\n", ZERO_START);
print msg_subst("err: ({\\d-10})\n", ZERO_START);
print msg_subst('overflow: {\d99}{\d99}{\d99}{\d99}{\d99}{\d99}{\d99}{
+\d99}{\d99}{\d99}{\d99}'."\n",
ZERO_START);
__END__
__C__
#include <stdint.h>
/* see: http://stackoverflow.com/questions/1167253/implementation-of
+-rand */
static uint32_t z1 = 12345, z2 = 12345, z3 = 12345, z4 = 12345; /* s
+eed */
void msg_subst_seed(int seed) { /* TODO: needs improvement */
if (seed < 0) seed = -seed;
if (!seed) seed = 1;
z1 = seed; /* z1: positive and >= 1 */
}
uint32_t lfsr113_Bits (void) {
uint32_t b;
b = ((z1 << 6) ^ z1) >> 13;
z1 = ((z1 & 4294967294U) << 18) ^ b;
b = ((z2 << 2) ^ z2) >> 27;
z2 = ((z2 & 4294967288U) << 2) ^ b;
b = ((z3 << 13) ^ z3) >> 21;
z3 = ((z3 & 4294967280U) << 7) ^ b;
b = ((z4 << 3) ^ z4) >> 12;
z4 = ((z4 & 4294967168U) << 13) ^ b;
return (z1 ^ z2 ^ z3 ^ z4);
} /* see.. until here */
char get_digit() {/* iterator to get an ASCII digit in range '0'..'
+9', the naive way... */
static int count = 0;
static uint32_t drawn_rand;
char digit;
if (!count) {
drawn_rand = lfsr113_Bits();
count = 8;
}
digit = (char) ('0' + drawn_rand % 10);
drawn_rand /= 10;
count--;
return digit;
}
SV * msg_subst(char* msg, int mode) {
static char buf[1024];
/* keep the result smaller than this, increase the static b
+uffer, or use malloc() */
char *to = buf;
char *from = msg;
while(*from && (to-buf) <= 1000) {
/* for simplicity, we allow {\d0} .. {\d99} only */
if (*from == '{' && from[1] == '\\' && from[2] == 'd' && isdigi
+t( from[3] )) {
int i;
int count = from[3] - '0';
int consumed = 4;
if ( isdigit(from[4]) ) {
count = count * 10 + from[4] - '0';
consumed++;
}
if ( from[consumed] == '}' ) { /* found a valid {\d##} sequenc
+e */
if (count && mode == 1) {
count--;
*to = get_digit();
while (*to == '0') { *to = get_digit(); } /* no leading ze
+ros wanted */
to++;
}
for (i=0; i<count && (to-buf) <= 1000 ; i++) *to++ = get_d
+igit();
from += consumed;
} else { /* err, just copy what we saw */
for (i=0; i<consumed; i++) *to++ = from[i];
from += consumed -1;
}
} else { /* no {\d++} sequence in sight */
*to++ = *from;
}
from++;
}
*to = '\0';
if ( (to-buf) >= 1000 ) strcpy( to, "... *OVERFLOW*!\n");
return newSVpv(buf, strlen(buf));
}
/* further improvements:
- profile / this alternative "can" be faster
- other/better random generator or lookup-table ()
- parse pattern once (if pattern stays constant for a sufficiently l
+ong period of time)
- more than 99 digits
- Modes: HEX, ASCII, SET(...), RANGE(...)
- tweak "use inline C;"
- pre-compute messages if possible
- return a list of results
- find more buffer overflows ;-)
- find mem-leaks?
- set $!
- ...
*/
Sample:
[3x: (5800)x(9)x(4119272153)]
empty: ()
1-dig: (6)
2-dig: (00)
3-dig: (286)
4-dig: (3538)
5-dig: (76912)
6-dig: (152922)
7-dig: (1998601)
8-dig: (46526790)
9-dig: (272269142)
10-dig: (5036180257)
2x : (270) x (295)
huge: (60698211971351646325662438145076882977674264083631891784014981
+6984333515486911733872823592611980610)
err: ({\d100})
err: ({\d-10})
overflow: 388337913014518572079085655682352923685268720200399379872225
+309937696652401652578874749078259328418468960924412471336333591725866
+610726754040840249257924549575536281471052995818025767742622376415863
+863933450041956239915916349295988704367621957622747170445376882427886
+836368075301197163788171910212466615189214921253457526440240130287642
+877297593187189458587800219270169705119636312020472283690712168123351
+112899925547204701597189413213272345551314154916747140042506436194250
+360536730048179486951946798572111575588675513637404504421938424450645
+872242712239584218866179138498996988181042546504722655259441150264820
+656960415909356582097541742743117114058581773895455908059546695344897
+361883153435808150852670180088922568961467160025338460179243921987976
+441004315513268035989754880873846622730659638610226282538414297658365
+180965342365181518413709147137791131035270552146178096818064703332869
+696209005692384041910395288531887465999649751307270627069753831067510
+2369072243267355810846576590914699... *OVERFLOW*!
| [reply] [d/l] [select] |
Re: Speed Improvement
by GotToBTru (Prior) on Dec 01, 2014 at 19:28 UTC
|
UPDATE - turns out study is a no-op since 5.16.0. Not sure why this is not noted in the documentation. Perhaps because this was always a try-it-and-see kind of thing. So, if you're on versions earlier than that, it might help. In any event, it would not help in this particular application.
See if study might help. It is intended to speed searches within a string.
| [reply] |
|
|
Note that study() has been a no-op since 5.16.0
Dave.
| [reply] |
|
|
UPDATE: typo inflates the results here, study does not produce the improvement shown here. But it does do something, not nothing, so I'm pretty sure it's not a no-op.
C:\perl>perl -v
This is perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x
+86-multi-thread-64int
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Benchmark qw(cmpthese);
my @messages = <DATA>;
cmpthese(1000000, {
'mca_func' => sub {
my $out = mca_substitute($_) foreach (@messages);
},
'toolic_func' => sub {
my $out = toolic_substitute($_) foreach (@messages);
},
'study_func' => sub {
my $out = study_substitute($_) for each (@messages);
},
});
sub toolic_substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge
+;
return $message;
}
sub study_substitute {
my $message = shift; study $message;
$message =~ s/\{\\d(\d+)\}/sprintf "%0${1}d", int(rand(10**$1))/ge
+;
return $message;
}
sub mca_substitute {
my $message = shift;
$message =~ s/\{\\d(\d+)\}/myreplace($1)/ge;
return $message;
}
sub myreplace {
return '' unless $_[0];
my $string = '';
$string .= int(rand 10) for (1..$_[0]);
return $string;
}
__DATA__
This is a message! {\d3} --- {\d2}
Another {\d3} Message with {\d5}
A {\d1} little {\d6} longer string {\d3}
C:\perl>perl testpm.pl
Rate mca_func toolic_func study_func
mca_func 34744/s -- -32% -78%
toolic_func 50795/s 46% -- -68%
study_func 157505/s 353% 210% --
C:\perl>
Not bad for a no-op.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
Re: Speed Improvement
by Laurent_R (Canon) on Dec 01, 2014 at 19:54 UTC
|
The first thing to do would be to profile your code, to figure out what takes at lot of time. Try Devel::NYTProf, probably the best profiler available under Perl.
| [reply] |
Re: Speed Improvement
by GrandFather (Saint) on Dec 01, 2014 at 20:19 UTC
|
Given McA's benchmark indicates you can expect to process at least 20,000 of these a second what are you doing where that is a limiting factor? Maybe you should tell us a little more of what you are doing in case we can identify larger bottle necks? Even better would be to profile your code and see for yourself where the issues are.
Perl is the programming world's equivalent of English
| [reply] |