Re: Finding all substrings
by thelenm (Vicar) on Apr 24, 2002 at 17:54 UTC
|
One small optimization you may do is to keep a temporary variable instead of calling length() at each iteration of each loop:
sub substrings {
my $string = shift;
my @result = ();
my $strlen = length $string;
foreach my $length (1..$strlen) {
foreach my $offset (0..$strlen-$length) {
push @result, substr($string,$offset,$length);
}
}
return @result;
}
| [reply] [d/l] |
|
|
Actually it makes it ever so slightly slower, much to my surprise :-/
use Benchmark qw(cmpthese);
sub substrings_TheHobbit {
my $string = shift;
my @result = ();
foreach my $length (1..length($string)) {
foreach my $offset (0..length($string)-$length) {
push @result,substr($string,$offset,$length);
}
}
return @result;
}
sub substrings_thelenm {
my $string = shift;
my @result = ();
my $strlen = length $string;
foreach my $length (1..$strlen) {
foreach my $offset (0..$strlen-$length) {
push @result, substr($string,$offset,$length);
}
}
return @result;
}
cmpthese(-10, {
TheHobbit => sub { substrings_TheHobbit("Just Another Perl Hacke
+r,") },
thelenm => sub { substrings_thelenm("Just Another Perl Hacker,
+") },
});
__output__
Benchmark: running TheHobbit, thelenm, each for at least 10 CPU second
+s...
TheHobbit: 12 wallclock secs (10.51 usr + 0.03 sys = 10.54 CPU) @ 80
+4.55/s (n=8480)
thelenm: 14 wallclock secs (10.44 usr + 0.02 sys = 10.46 CPU) @ 80
+9.37/s (n=8466)
Rate TheHobbit thelenm
TheHobbit 805/s -- -1%
thelenm 809/s 1% --
Is there any reason why a function call would be quicker accessing a lexical variable?
_________ broquaint | [reply] [d/l] |
|
|
I bet it's because length() is special. All SVs cache their length already, so putting it in a lexical is just a slower way to cache length().
-sam
| [reply] |
|
|
Some repeated benchmarking answers this . . .
my is expensive. Removing the my and making $strlen a global eliminates the diferrence for the small test case. The bench about the same. Multiplying the length of the test string by ten
and leaving $strlen global gives thelenm's sub big gains over the orginal - %6-%10. Finally, making $strlen lexical and retesting reduces those gains by %4-%8, still testing with "Just Another Perl Hacker,"x10.
So, yes, this is definately an optimization.
Cheers,
Erik
| [reply] [d/l] [select] |
|
|
AFAIK the code between parens in the foreach statement is executed just once to obtain a list over which to iterate. So yours is not really an optimization.
Update: i see now your and suaveant point as the inner loop get called multiple times: the original post was not that clear.
$|=$_='1g2i1u1l2i4e2n0k',map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g
| [reply] |
|
|
| [reply] |
Re: Finding all substrings
by suaveant (Parson) on Apr 24, 2002 at 18:05 UTC
|
I guess the real question is... what are you using this for... you could be more efficient by waiting till you need a set of substrings, then generating and caching all the 3 letter substrs when you need a 3 letter substr... but that only works for certain situations...
or you could have better space efficiency by keeping a list of offsets in an array, each index indicating the substring length... like so...
foreach my $length (1..length($string)) {
foreach my $offset (0..length($string)-$length) {
# push @result,substr($string,$offset,$length);
push @{$result[$length]}, $offset;
}
}
Untested, but now @result should hold lists of offsets, with the primary index being the length of the substr... which uses much less space...
really there are endless possibilities... so how can you narrow it down?
- Ant
- Some of my
best work - (1 2 3)
| [reply] [d/l] |
Re: Finding all substrings
by samtregar (Abbot) on Apr 24, 2002 at 18:17 UTC
|
This solution depends on two factors, how many times does the code run and how much do the length of the strings vary? If the answers are "a lot" and "not much" then here's a faster solution:
{
my @cache; # a cache of substring-finding subs
sub substrings {
my $string = shift;
my $length = length $string;
# use cached sub if we have one
return $cache[$length]->($string)
if exists $cache[$length];
# create sub to find substrings for this length
my $sub = 'sub { $_ = shift; return (';
foreach my $length (1..length($string)) {
foreach my $offset (0..length($string)-$length) {
$sub .= "substr(\$_,$offset,$length),";
}
}
$sub .= ")};";
$cache[$length] = eval $sub;
# and use it
return $cache[$length]->($string);
}
}
Using Benchmark.pm and a test case against random words from /usr/dict/words I found this to be 300% faster over 100,000 iterations. Not bad, but I bet we could do better!
-sam
PS: Has anyone noticed that <code> doesn't deal with hard tabs right? Copy-and-paste from Emacs is unpleasant. | [reply] [d/l] |
|
|
Slight improvement, now around 400% faster, at the expense of readability. I got rid of the parameter passing to the cached subs by using $_. Also, I unrolled the last iteration replacing a substr($string,0,length($string)) with just $string. I tried unrolling the first iteration into a split() but that was slower.
{
my @cache;
sub substrings {
$_ = shift;
return &{$cache[length($_)]}
if exists $cache[length($_)];
my $sub = 'sub { return (';
foreach my $len (1..length($_)-1) {
foreach my $off (0..length($_)-$len) {
$sub .= "substr(\$_,$off,$len),";
}
}
$sub .= "\$_)};";
$cache[length($_)] = eval $sub;
return &{$cache[length($_)]};
}
}
The only trick I have left is Inline::C... I'll leave that as an exercise for the reader.
-sam
| [reply] [d/l] |
|
|
Says samtregar:
I found this to be 300% faster over
100,000 iterations. Not bad, but I bet we could do better!
Supposing that the original version took 10 seconds to run,
then yours, 300% faster, completes and delivers
the answer 20 seconds before you even enter the command
that runs it. That
is an impressive achievement indeed. I admire your optimization skills.
Perhaps you meant that yours was three times as fast?
If so, it was 66.67% faster, not 300%.
Hope this helps.
--
Mark Dominus
Perl Paraphernalia
| [reply] |
|
|
Um, just reading the Benchmark output man:
Rate original new
original 7710/s -- -80%
new 38610/s 401% --
If that doesn't mean 400% faster then I guess I need more education.
-sam | [reply] [d/l] |
|
|
|
|
|
|
If it took 10 seconds, and was 300% faster, that would merely mean it was 3 seconds long. I think. Actually i think theyre both interchangeable
| [reply] |
Re: Finding all substrings
by Dominus (Parson) on Apr 24, 2002 at 19:28 UTC
|
I don't know if this is faster (although I suspect it is) but it certainly is cooler:
sub substrings {
my @ss;
$_[0] =~ /.*?(.+?)(?{push @ss, $1})(?!)/;
@ss;
}
--
Mark Dominus
Perl Paraphernalia
| [reply] [d/l] |
|
|
Um, cool! But it leaks memory at an insane rate on 5.6.1. Trying to run 100,000 iterations nearly made my computer explode.
-sam
| [reply] |
|
|
Says samtregar:
Um, cool! But it leaks memory at an insane rate on 5.6.1.
Jeff Pinyan explained this one to me. The code in the regex captures
the first instance of @ss, and keeps
appending to it, 1023 new elements
every time through the loop. You can fix the problem by using a
static variable:
{ my @ss;
sub substrings {
@ss = ();
$_[0] =~ /.*?(.+?)(?{push @ss, $1})(?!)/;
@ss;
}
}
--
Mark Dominus
Perl Paraphernalia
| [reply] [d/l] |
|
|
Re: Finding all substrings
by erikharrison (Deacon) on Apr 24, 2002 at 18:12 UTC
|
The root of optimization is finding a solid algorithm, then implementing it, then speeding the implementation. You algorithm is solid (I don't know of a faster one . . .monks?) and your implementation is simple, meaning not much room for speed gain. Offhand though, C style for loops might be slightly faster (perl does an implicit ++ and check when you call the range operator, hence the rationale that doing it directly might be faster).
Cheers,
Erik
| [reply] |
|
|
I disagree. The root of optimization is finding a solid algorithm, implementing it, tuning it and then finding a way to exploit caching to make it faster. See my solution below for an example. See Memoize for another option.
-sam
| [reply] |
Re: Finding all substrings
by sfink (Deacon) on Apr 24, 2002 at 20:35 UTC
|
This runs a tiny bit faster for me: (5% on short strings, much less for longer strings.) sub substrings {
my $string = shift;
my @result = ();
foreach my $start (0..length($string)-1) {
my $substr = substr($string, $start);
while (length($substr)) {
push @result, $substr;
chop($substr);
}
}
return @result;
}
But forget speed. The fun solution is:sub substrings {
local $_ = shift;
my @result;
do { push @result, /(?=(.+)$)/sg; chop } while (length);
return @result;
}
Now if I could only figure out how to do it all in one regex... anyone?
Update: That durn mjd got there before I even submitted this... though he used embedded code. Is there any way without it? | [reply] [d/l] [select] |