sickboy has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: grep help
by japhy (Canon) on Jan 22, 2002 at 09:15 UTC | |
Your code, as you posted, does not work. Your hash, %friends has the same hash reference in all its values. Why? Because you didn't declare $sucker as a lexical; if you understand the consequences, you'd know why you'd have to. Your use of regexes can be totally wiped out, and in particular, your regex /^j.*/i goes overboard and could merely have been /^j/. You're sorting the keys of %friends, which all happen to be numbers, but would end up being sorted ASCIIbetically instead of numerically, so the keys would be (1,10,11,12, ... ,2,20,21, ... ,3,4, ...). You probably don't want that. You should be using an array instead of a hash. Here is how I would write your code. Don't insult my friends. That's a ground rule here.
_____________________________________________________ | [reply] [d/l] |
by demerphq (Chancellor) on Jan 23, 2002 at 00:20 UTC | |
But, it turns out the lc stuff is bad advice. Its actually faster with a regex. Anyway, see below...
Yves / DeMerphq | [reply] |
| |
|
(ar0n) Re: grep help
by ar0n (Priest) on Jan 22, 2002 at 04:19 UTC | |
It's an okay way, but you don't really need regular expressions to match the haircolor or the firstname, since you're just matching string constants ("brown" and "j"). And since the keys are just indices, it's probably better to use an array:
The new line in the grep block, is, of course, only a style decision. Update: Yes, I have tried it, and it works for me. What error is it giving?
Update 2: You've changed the format of the input file. [ ar0n -- want job (boston) ] | [reply] [d/l] |
| |
| |
|
Re: grep help
by demerphq (Chancellor) on Jan 23, 2002 at 00:47 UTC | |
First, I would say that it might be a good idea to get out of the habit of doing $$ for non simple refs. They are confusing to read and dont work for more complicated structures. Use -> as a dereference operator and your code will be more readable and there are only a very few scenarios where they cant be used. Second you are sorting your data prior to the grep, which is ok for small data sets but doesnt scale well. (Infact a close inspection of the way you are doing this indicates you actually pass over your entire data set 3 times, with the sort being treated as 1 pass which is an underestimate.) Instead sort your data AFTER the grep, this is better because there should be less records to sort. Unfortunately with your current data structure this isnt too easy, you might for instance consider putting the id inside the record as well as being the key to the hash containing the records. Third, assuming you do want the list sorted by key you might want to be aware that currently you are getting a lexicographical sort order and not a numeric order (ie lexicographical:1<10<2<20 numeric 1<2<10<20) Fourth minor point but, /^j.*/ is the same as /^j/. While its not bad (optimizer will sort it out I bet, er maybe not, after all) errors are minimized by saying what you mean precisely. Anyway, ill give you a couple of variants, along with some benchmarking code to show what I mean. Running this code you will see that for trivially small data sets your algorithm is the fastest with the internal_id variant marginally slower. However as the data set scales up the performance difference becomes more and more marked. At 20k records internal_id is around 80% faster and even no_internal_id() surpassing your original. Anyway, the benchmark results are as follows, and I think its pretty clear which variant I would use...
Benchmarking hash of 17 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.25 usr + 0.00 sys = 31.25 CPU) @ 12040.13/s (n=376254)
no_internal_id: 33 wallclock secs (31.66 usr + 0.02 sys = 31.67 CPU) @ 8681.97/s (n=274984)
sickboy: 31 wallclock secs (31.38 usr + 0.00 sys = 31.38 CPU) @ 13151.01/s (n=412613)
sickboy_rewrite: 32 wallclock secs (31.59 usr + 0.00 sys = 31.59 CPU) @ 13379.41/s (n=422709)
Rate no_internal_id internal_id sickboy sickboy_rewrite
no_internal_id 8682/s -- -28% -34% -35%
internal_id 12040/s 39% -- -8% -10%
sickboy 13151/s 51% 9% -- -2%
sickboy_rewrite 13379/s 54% 11% 2% --
Benchmarking hash of 187 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.64 usr + 0.00 sys = 31.64 CPU) @ 1240.35/s (n=39246)
no_internal_id: 33 wallclock secs (31.48 usr + 0.00 sys = 31.48 CPU) @ 832.23/s (n=26202)
sickboy: 32 wallclock secs (31.80 usr + 0.00 sys = 31.80 CPU) @ 1116.87/s (n=35513)
sickboy_rewrite: 32 wallclock secs (31.66 usr + 0.00 sys = 31.66 CPU) @ 1143.45/s (n=36197)
Rate no_internal_id sickboy sickboy_rewrite internal_id
no_internal_id 832/s -- -25% -27% -33%
sickboy 1117/s 34% -- -2% -10%
sickboy_rewrite 1143/s 37% 2% -- -8%
internal_id 1240/s 49% 11% 8% --
Benchmarking hash of 1887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 33 wallclock secs (31.76 usr + 0.00 sys = 31.76 CPU) @ 77.54/s (n=2463)
no_internal_id: 32 wallclock secs (31.61 usr + 0.00 sys = 31.61 CPU) @ 51.47/s (n=1627)
sickboy: 33 wallclock secs (31.77 usr + 0.00 sys = 31.77 CPU) @ 61.61/s (n=1957)
sickboy_rewrite: 33 wallclock secs (31.48 usr + 0.00 sys = 31.48 CPU) @ 62.16/s (n=1957)
Rate no_internal_id sickboy sickboy_rewrite internal_id
no_internal_id 51.5/s -- -16% -17% -34%
sickboy 61.6/s 20% -- -1% -21%
sickboy_rewrite 62.2/s 21% 1% -- -20%
internal_id 77.5/s 51% 26% 25% --
Benchmarking hash of 18887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (31.45 usr + 0.00 sys = 31.45 CPU) @ 7.28/s (n=229)
no_internal_id: 33 wallclock secs (31.95 usr + 0.00 sys = 31.95 CPU) @ 4.54/s (n=145)
sickboy: 31 wallclock secs (30.23 usr + 0.00 sys = 30.23 CPU) @ 4.04/s (n=122)
sickboy_rewrite: 31 wallclock secs (31.33 usr + 0.00 sys = 31.33 CPU) @ 4.05/s (n=127)
Rate sickboy sickboy_rewrite no_internal_id internal_id
sickboy 4.04/s -- -0% -11% -45%
sickboy_rewrite 4.05/s 0% -- -11% -44%
no_internal_id 4.54/s 12% 12% -- -38%
internal_id 7.28/s 80% 80% 60% --
Benchmarking hash of 188887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 31 wallclock secs (30.73 usr + 0.00 sys = 30.73 CPU) @ 0.72/s (n=22)
no_internal_id: 31 wallclock secs (31.00 usr + 0.02 sys = 31.01 CPU) @ 0.42/s (n=13)
sickboy: 32 wallclock secs (31.24 usr + 0.00 sys = 31.24 CPU) @ 0.29/s (n=9)
sickboy_rewrite: 32 wallclock secs (31.11 usr + 0.00 sys = 31.11 CPU) @ 0.29/s (n=9)
s/iter sickboy sickboy_rewrite no_internal_id internal_id
sickboy 3.47 -- -0% -31% -60%
sickboy_rewrite 3.46 0% -- -31% -60%
no_internal_id 2.39 45% 45% -- -41%
internal_id 1.40 148% 147% 71% --
Benchmarking hash of 358887 records.
Benchmark: running internal_id, no_internal_id, sickboy, sickboy_rewrite,
each for at least 30 CPU seconds...
internal_id: 32 wallclock secs (30.25 usr + 0.00 sys = 30.25 CPU) @ 0.36/s (n=11)
no_internal_id: 34 wallclock secs (32.20 usr + 0.17 sys = 32.37 CPU) @ 0.22/s (n=7)
sickboy: 35 wallclock secs (33.37 usr + 0.02 sys = 33.39 CPU) @ 0.15/s (n=5)
sickboy_rewrite: 34 wallclock secs (33.41 usr + 0.00 sys = 33.41 CPU) @ 0.15/s (n=5)
s/iter sickboy_rewrite sickboy no_internal_id internal_id
sickboy_rewrite 6.68 -- -0% -31% -59%
sickboy 6.68 0% -- -31% -59%
no_internal_id 4.62 44% 44% -- -41%
internal_id 2.75 143% 143% 68% --
Anyway, hope that helps.
PS:I realize that much of this has already been said by japhy, originally this post was going to be to the other thread, but it seems more topical here (er, at least now that I know this thread exists).
Yves / DeMerphq | [reply] [d/l] |
|
Re: grep help
by goldclaw (Scribe) on Jan 22, 2002 at 04:08 UTC | |
The grep is otherwise perfectly OK. goldclaw | [reply] [d/l] [select] |