in reply to grep help
First, I would say that it might be a good idea to get out of the habit of doing $$ for non simple refs. They are confusing to read and dont work for more complicated structures. Use -> as a dereference operator and your code will be more readable and there are only a very few scenarios where they cant be used.
Second you are sorting your data prior to the grep, which is ok for small data sets but doesnt scale well. (Infact a close inspection of the way you are doing this indicates you actually pass over your entire data set 3 times, with the sort being treated as 1 pass which is an underestimate.) Instead sort your data AFTER the grep, this is better because there should be less records to sort. Unfortunately with your current data structure this isnt too easy, you might for instance consider putting the id inside the record as well as being the key to the hash containing the records.
Third, assuming you do want the list sorted by key you might want to be aware that currently you are getting a lexicographical sort order and not a numeric order (ie lexicographical:1<10<2<20 numeric 1<2<10<20)
Fourth minor point but, /^j.*/ is the same as /^j/. While its not bad (optimizer will sort it out I bet, er maybe not, after all) errors are minimized by saying what you mean precisely.
Anyway, ill give you a couple of variants, along with some benchmarking code to show what I mean.
Running this code you will see that for trivially small data sets your algorithm is the fastest with the internal_id variant marginally slower. However as the data set scales up the performance difference becomes more and more marked. At 20k records internal_id is around 80% faster and even no_internal_id() surpassing your original. Anyway, the benchmark results are as follows, and I think its pretty clear which variant I would use...use strict; use warnings; use Data::Dumper; use Benchmark 'cmpthese'; our %friends; our @results; my $counter=0; my @DATA=<DATA>; chomp @DATA; sub setup_data { my $repeat=shift || 1; for (1..$repeat) { foreach (@DATA) { $friends{++$counter}={}; @{$friends{$counter}}{qw(ID FIRSTNAME LASTNAME HAIR)} = ($c +ounter,split(/,/, $_)); } } print Dumper(\%friends) unless $repeat>1; } sub sickboy { @results=grep ($$_{HAIR}=~ /brown/i && $$_{FIRSTNAME}=~/^j.*/i, @friends{sort keys %friends}); } sub sickboy_rewrite { # Minor rewrite using -> and non superfluous regex, with /o modifier a +s well @results=grep ($_->{HAIR}=~ /brown/io && $_->{FIRSTNAME}=~/^j/i, @friends{sort keys %friends}); } sub internal_id { # assuming that the ID is contained in the record, as well as being th +e key # uncomment the next line to see how this could work with your existin +g code # $friends{$_}->{ID}=$_ foreach keys %friends; @results=sort { $a->{ID} <=> $a->{ID} } #numeric sort grep { $_->{HAIR}=~/brown/i && $_->{FIRSTNAME}=~/^j/i } values %friends; } sub no_internal_id { # assuming there is _no_ way to add an ID to each record, but sort is +still after the grep @results=map { $friends{$_} } sort { $a <=> $b } grep { $friends{$_}->{HAIR}=~/brown/i && $friends{$_}->{FIRSTNAME}=~/^j/i } keys %friends; } sub test_and_display { local $\="\n"; sickboy; print Dumper(\@results); sickboy_rewrite; print Dumper(\@results); internal_id; print Dumper(\@results); no_internal_id; print Dumper(\@results); } # for debugging # setup_data; # test_and_display; for my $x (1,10,100,1_000,10_000,10_000) { setup_data($x); print "\n\nBenchmarking hash of ",scalar keys %friends," records.\ +n"; cmpthese(-30, {sickboy=>\&sickboy, sickboy_rewrite=>\&sickboy_rewrite, internal_id=>\&internal_id, no_internal_id=>\&no_internal_id, }); } __DATA__ John,Brown,red Fred,Flintstone,black Jane,Brown,blond Betty,Rubble,black John,Kennedy,brown Jodi,Brown,brown Bill,Black,blond June,Green,brown John,McEnroe,brown Erin,White,brown Sean,Comb,green Lexius,Luser,black Monty,Python,balding Joe,Bloggs,white Ian,Flemming,red Zoe,Morte,blonde Peter,Phillips,chrome
Benchmarking hash of 17 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.25 usr + 0.00 sys = 31.25 CPU) @ 12040.13/s (n=376254)
no_internal_id: 33 wallclock secs (31.66 usr + 0.02 sys = 31.67 CPU) @ 8681.97/s (n=274984)
sickboy: 31 wallclock secs (31.38 usr + 0.00 sys = 31.38 CPU) @ 13151.01/s (n=412613)
sickboy_rewrite: 32 wallclock secs (31.59 usr + 0.00 sys = 31.59 CPU) @ 13379.41/s (n=422709)
Rate no_internal_id internal_id sickboy sickboy_rewrite
no_internal_id 8682/s -- -28% -34% -35%
internal_id 12040/s 39% -- -8% -10%
sickboy 13151/s 51% 9% -- -2%
sickboy_rewrite 13379/s 54% 11% 2% --
Benchmarking hash of 187 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.64 usr + 0.00 sys = 31.64 CPU) @ 1240.35/s (n=39246)
no_internal_id: 33 wallclock secs (31.48 usr + 0.00 sys = 31.48 CPU) @ 832.23/s (n=26202)
sickboy: 32 wallclock secs (31.80 usr + 0.00 sys = 31.80 CPU) @ 1116.87/s (n=35513)
sickboy_rewrite: 32 wallclock secs (31.66 usr + 0.00 sys = 31.66 CPU) @ 1143.45/s (n=36197)
Rate no_internal_id sickboy sickboy_rewrite internal_id
no_internal_id 832/s -- -25% -27% -33%
sickboy 1117/s 34% -- -2% -10%
sickboy_rewrite 1143/s 37% 2% -- -8%
internal_id 1240/s 49% 11% 8% --
Benchmarking hash of 1887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 33 wallclock secs (31.76 usr + 0.00 sys = 31.76 CPU) @ 77.54/s (n=2463)
no_internal_id: 32 wallclock secs (31.61 usr + 0.00 sys = 31.61 CPU) @ 51.47/s (n=1627)
sickboy: 33 wallclock secs (31.77 usr + 0.00 sys = 31.77 CPU) @ 61.61/s (n=1957)
sickboy_rewrite: 33 wallclock secs (31.48 usr + 0.00 sys = 31.48 CPU) @ 62.16/s (n=1957)
Rate no_internal_id sickboy sickboy_rewrite internal_id
no_internal_id 51.5/s -- -16% -17% -34%
sickboy 61.6/s 20% -- -1% -21%
sickboy_rewrite 62.2/s 21% 1% -- -20%
internal_id 77.5/s 51% 26% 25% --
Benchmarking hash of 18887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.45 usr + 0.00 sys = 31.45 CPU) @ 7.28/s (n=229)
no_internal_id: 33 wallclock secs (31.95 usr + 0.00 sys = 31.95 CPU) @ 4.54/s (n=145)
sickboy: 31 wallclock secs (30.23 usr + 0.00 sys = 30.23 CPU) @ 4.04/s (n=122)
sickboy_rewrite: 31 wallclock secs (31.33 usr + 0.00 sys = 31.33 CPU) @ 4.05/s (n=127)
Rate sickboy sickboy_rewrite no_internal_id internal_id
sickboy 4.04/s -- -0% -11% -45%
sickboy_rewrite 4.05/s 0% -- -11% -44%
no_internal_id 4.54/s 12% 12% -- -38%
internal_id 7.28/s 80% 80% 60% --
Benchmarking hash of 188887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 31 wallclock secs (30.73 usr + 0.00 sys = 30.73 CPU) @ 0.72/s (n=22)
no_internal_id: 31 wallclock secs (31.00 usr + 0.02 sys = 31.01 CPU) @ 0.42/s (n=13)
sickboy: 32 wallclock secs (31.24 usr + 0.00 sys = 31.24 CPU) @ 0.29/s (n=9)
sickboy_rewrite: 32 wallclock secs (31.11 usr + 0.00 sys = 31.11 CPU) @ 0.29/s (n=9)
s/iter sickboy sickboy_rewrite no_internal_id internal_id
sickboy 3.47 -- -0% -31% -60%
sickboy_rewrite 3.46 0% -- -31% -60%
no_internal_id 2.39 45% 45% -- -41%
internal_id 1.40 148% 147% 71% --
Benchmarking hash of 358887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (30.25 usr + 0.00 sys = 30.25 CPU) @ 0.36/s (n=11)
no_internal_id: 34 wallclock secs (32.20 usr + 0.17 sys = 32.37 CPU) @ 0.22/s (n=7)
sickboy: 35 wallclock secs (33.37 usr + 0.02 sys = 33.39 CPU) @ 0.15/s (n=5)
sickboy_rewrite: 34 wallclock secs (33.41 usr + 0.00 sys = 33.41 CPU) @ 0.15/s (n=5)
s/iter sickboy_rewrite sickboy no_internal_id internal_id
sickboy_rewrite 6.68 -- -0% -31% -59%
sickboy 6.68 0% -- -31% -59%
no_internal_id 4.62 44% 44% -- -41%
internal_id 2.75 143% 143% 68% --
Anyway, hope that helps.
PS:I realize that much of this has already been said by japhy, originally this post was going to be to the other thread, but it seems more topical here (er, at least now that I know this thread exists).
BTW:I have the feeling that you should adopt a slightly less antagonistic attitude. You'd get less hassle and much more respect.
Yves / DeMerphq
--
When to use Prototypes?
|
|---|