Re: List processing performance
by btrott (Parson) on Jul 11, 2000 at 19:56 UTC
|
Well, in general you're doing pretty well, I think. You're
using hashes for uniqueness lookups, which is always a good
first step. :)
One thing that came to my mind is why don't you combine
the two steps? So you'd end up w/ something like this:
my($self) = @_;
my(%seen, %r);
my @common = qw(a and at);
@seen{@common} = ();
for my $r (@words) {
next if exists $seen{ lc $r };
$r{lc $r} = 1;
}
my @words = sort keys %r;
| [reply] [d/l] |
|
|
my @words = ( ... );
my @common = qw|a and at|;
my %seen;
@seen{@common} = (1) x @common;
@words = grep { $_ = lc; ! $seen{$_}++ } sort @words;
:-)
Autark | [reply] [d/l] |
|
|
Oh, is *that* what we're doing then. :) In which case replace
your last line with this:
@words = sort grep !$seen{$_ = lc}++, @words
A couple characters shorter, plus it's faster because
the sort is done after you've filtered out duplicates
and common words. | [reply] [d/l] |
|
|
|
|
|
|
A good point about combining the two parts - the real code is actually two subroutines but only because I developed it in stages, there's no compelling reason for it. Your method also eliminates the temporary list which was bugging me. Thanks muchly.
| [reply] |
RE: List processing (benchmark results)
by Russ (Deacon) on Jul 12, 2000 at 01:21 UTC
|
Update: btrott found that I had left out a piece of
the original code. New code and results are shown here.
Being interminably curious, I benchmarked a few of the
suggested snippets of code.
Benchmark: timing 1000 iterations of Odud, PMGurus, btrott, visnu...
Odud: 10 wallclock secs ( 9.61 usr + 0.01 sys = 9.62 CPU)
PMGurus: 6 wallclock secs ( 5.41 usr + 0.01 sys = 5.42 CPU)
btrott: 7 wallclock secs ( 7.17 usr + 0.04 sys = 7.21 CPU)
visnu: 11 wallclock secs (11.31 usr + 0.00 sys = 11.31 CPU)
So, combining the two actions into one (btrott's suggestion)
results in an immediate 25% improvement and further
refinement by the Perl Monks Guru squad reduced the times
by another 25% over that. The one-liner (and I always lean
toward one-liners in Perl) performed worse than the
original... How depressing! :-)
I am pasting my entire testing code (since I have been
known to do silly things in the name of benchmarking ;-))
#!/usr/bin/perl -w
use strict;
use Benchmark;
use vars qw/@words @common/;
@words = qw/Notice the use of an anonymous inner class Here an instan
+ce of an
unnamed class derived from Thread is created and a run method that
calls endApp is defined for the class
All of this means that when the application is about to terminate
the JVM starts the thread representing by the passed-in thread
object When the thread starts the run method is called The run
method calls endApp which closes the log file This flushes the
output buffer
To underscore the effect of the shutdown hook comment out the
addShutdownHook lines in ShutdownDemo You'll see that the log
file is empty when the program terminates
You can register multiple shutdown hooks In this case each thread
that represents a hook is started in an unspecified order and the
various threads run simultaneously You cannot register or
unregister a shutdown hook after the shutdown sequence has started
Doing so results in an IllegalStateException
Because shutdown hooks run as threads you must use thread-safe
programming techniques Otherwise you risk having threads interfere
with each other Also it's wise to design your application for
simple and fast shutdown processing For example you might run
into trouble if your application uses services during shutdown that
are themselves in the processing of being shut down
There are cases where shutdown processing does not happen even if
you have registered shutdown hooks One example is corrupted native
methods for example when you dereference a null pointer in C code
This feature is somewhat similar to the atexit library function in
/;
@common = qw|a an and at the to|;
timethese(1000, {
Odud => q{
my (@only, %r);
foreach $r (@words){
$r{lc $r} = 1;
}
my @uniqwords = sort keys %r;
@seen{@common} = ();
foreach $item (@uniqwords) {
push(@only,$item) unless exists $seen{$item};
}
my @newwords = @only;
},
PMGurus => q{
my %seen;
@seen{@common} = (1) x @common;
my @newwords = sort grep !$seen{+lc}++, @words;
},
btrott => q{
my(%seen, %r);
@seen{@common} = ();
for my $r (@words) {
next if exists $seen{ lc $r };
$r{lc $r} = 1;
}
my @newwords = sort keys %r;
},
visnu => q{
my @newwords = sort keys %{+{ map { !$common{$_ = lc $_} ? ($_, 1)
+ : () }
@words }};
},
});
Russ
Brainbench 'Most Valuable Professional' for Perl | [reply] [d/l] [select] |
|
|
that is disappointing... the overhead of map creating the temporary array must be too great. hmmm.... now i'm trying to think of a way to pre-allocate so i can skew the benchmark results more in my favor..
| [reply] |
|
|
| [reply] |
|
|
|
|
I ran this myself, and visnu's code is broken in your benchmarking.
visnu's code is checking the %common hash for the common
words, and you don't have such a hash defined. Since the
other benchmarked code has setup code within the benchmarks
(setting up the common words hash), we should do that for
visnu's, so add this at the beginning:
my %common;
@common{@common} = (1) x @common;
Also, the PMGurus code doesn't work quite right, as it doesn't
lower-case the words in the grep.
After taking out the PMGurus code, and fixing up visnu's,
I got:
Benchmark: timing 1000 iterations of Odud, btrott, visnu...
Odud: 4 secs ( 3.55 usr 0.00 sys = 3.55 cpu)
btrott: 2 secs ( 2.27 usr 0.00 sys = 2.27 cpu)
visnu: 4 secs ( 3.98 usr 0.00 sys = 3.98 cpu)
| [reply] [d/l] [select] |
|
|
@common = qw|a an and at the to|;
timethese(1000, {
Odud => q{
my (@only, %r, %seen);
foreach $r (@words){
$r{lc $r} = 1;
}
my @uniqwords = sort keys %r;
foreach my $item (@uniqwords) {
push(@only,$item) unless exists $seen{$item};
}
my @newwords = @only;
},
PMGurus => q{
my %seen;
@seen{@common} = (1) x @common;
my @newwords = sort grep !$seen{+lc}++, @words;
},
btrott_OneLiner => q{
my %seen;
@seen{@common} = (1) x @common;
my @newwords = sort grep !$seen{$_=lc}++, @words;
},
btrott_FirstPass => q{
my (%seen, %r);
@seen{@common} = (1) x @common;
for my $r (@words) {
next if exists $seen{ lc $r };
$r{lc $r} = 1;
}
my @newwords = sort keys %r;
},
visnu => q{
my %common;
@common{@common} = (1) x @common;
my @newwords = sort keys %{+{ map { !$common{$_ = lc $_} ? ($_, 1)
+ : () }
@words }};
},
});
Benchmark: timing 1000 iterations of Odud, PMGurus, btrott_FirstPass,
+btrott_OneLiner, visnu...
Odud: 9 wallclock secs ( 9.61 usr + 0.00 sys = 9.61 CPU)
PMGurus: 6 wallclock secs ( 5.40 usr + 0.00 sys = 5.40 CPU)
btrott_FirstPass: 7 wallclock secs ( 7.07 usr + 0.00 sys = 7.07 CPU)
btrott_OneLiner: 6 wallclock secs ( 5.73 usr + 0.00 sys = 5.73 CPU)
visnu: 10 wallclock secs (10.29 usr + 0.00 sys = 10.29 CPU)
Russ
Brainbench 'Most Valuable Professional' for Perl | [reply] [d/l] |
RE: List processing performance
by visnu (Sexton) on Jul 11, 2000 at 23:22 UTC
|
#!/usr/bin/perl -w
%common = map { lc $_, 1 } qw/ a and the /;
@words = qw/ hello there Hello hello the a A /;
@words = keys %{+{ map { !$common{lc $_} ? (lc $_, 1) : () } @words }}
+;
$/ = "\n";
print @words;
| [reply] [d/l] |
|
|
@words = sort keys %{+{ map { !$common{$_ = lc $_} ? ($_, 1) : () } @w
+ords }};
and the $/ = "\n" didn't do what i wanted, but oh well.. i probably wanted $\ = ' ', but that seems broken right now too.. oh well again. | [reply] [d/l] [select] |
|
|
@words = sort keys %{+{ map { !$common{lc $_} ? (lc $_, 1) : () } @wor
+ds }};
| [reply] [d/l] |
|
|
Amazing - I'm going to have to take this off-line to work out what it is doing. map is currently top of my list of functions to try and understand. I've only ever used it based on examples in the Cookbook and then (even with the explanation) I've never been too sure of how was working. I'll print this out with the man page and look at it on the train tomorrow.
Cheers, Odud
| [reply] |
|
|
#!/usr/bin/perl -w
%common = map { lc $_, 1 } qw/ a and the /;
@words = qw/ hello there Hello hello the a A /;
@words = sort keys %{+{ map { !$common{$_ = lc $_} ? ($_, 1) : () } @w
+ords }};
print "@words\n";
haha! it's not too bad efficiency-wise. it does get rid of all temporaries and it does everything on one pass through the word list.. the only thing i'm concerned about is the list-flattening that's going on, but in reality, i'm sure it's nothing.
a more readable version would be:
%common = map { lc $_, 1 } qw/ the and a /;
@words = qw / hello Hello hello a The there /;
# lowercase all the words
@words = map { lc $_ } @words;
# shorter: @words = map { lc } @words;
# filter out common words
@words = grep { !common{$_} } @words;
# make a hash
%words = map { $_, 1 } @words
# get the sorted keys
@words = sort keys %words;
the only problem here is that we're going through the list 3 or so times. the one-liner goes through it only once. and the key to understanding the %words = map { $_, 1 } @words line is to remember that a hash can be made from a list in (key, value) order. so, map goes through the @words list and creates a new list (which it returns) consisting of (word1, 1, word2, 1, word3, 1, ...).. the %{+{ .. }} funny business was just a way to get keys to play fair. | [reply] [d/l] [select] |
Re: List processing performance
by vijeno (Initiate) on Nov 14, 2001 at 16:13 UTC
|
I did a little, very basic, test, and I find
the result quite interesting (and also confusing):
#!/usr/bin/perl
use Benchmark;
timethese(10000, {
'foreachmy' => sub {
my ($sum,$avg);
foreach my $i (1..1000) {
$sum += $i;
$avg = $sum/$i;
}
},
'foreachglobal' => sub {
my ($sum,$avg,$i);
foreach $i (1..1000) {
$sum += $i;
$avg = $sum/$i;
}
},
'foreachnovar' => sub {
my ($sum,$avg);
foreach (1..1000) {
$sum += $_;
$avg = $sum/$_;
}
},
'fornovar' => sub {
my ($sum,$avg);
for (1..1000) {
$sum += $_;
$avg = $sum/$_;
}
},
'ctypeFor' => sub {
my ($i,$sum,$avg);
for ($i=1;$i<=1000;$i++) {
$sum += $i;
$avg = $sum/$i;
}
}
}
);
Benchmark: timing 10000 iterations of ctypeFor, foreachglobal, foreach
+my, foreachnovar, fornovar...
ctypeFor: 26 wallclock secs (26.27 usr + 0.00 sys = 26.27 CPU)
foreachglobal: 22 wallclock secs (22.50 usr + 0.00 sys = 22.50 CPU)
foreachmy: 23 wallclock secs (22.42 usr + 0.00 sys = 22.42 CPU)
foreachnovar: 23 wallclock secs (23.33 usr + 0.00 sys = 23.33 CPU)
fornovar: 24 wallclock secs (23.44 usr + 0.03 sys = 23.47 CPU)
While I'm not so sure about the little difference between
the foreachlocal and foreachmy result, the construct
foreach local $i (.....)
definitely seems to be faster than
foreach (......)
... or am I doing something terrible to $_ IN the
foreach block? | [reply] [d/l] [select] |