lupey has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to find an efficient way to predeclare a hash with a set of keys like 5,6,8-53,62-106 (see below). My set of keys are basically members of a set where the value is not important. For example, it is only important that exists($set{67}) is true.
I'm having trouble finding appropriate documentation on this issue. Does anybody have advice on my problem?
Thank you.
Paul
$set{5}++; $set{6}++; $set{8}++; $set{9}++; $set{10}++; $set{11}++; $set{12}++; ... ...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How do I efficiently predeclare a hash with a set of keys
by jdporter (Paladin) on May 18, 2007 at 19:05 UTC | |
The most efficient way to initialize a hash (other than directly, i.e. %h = ( key => value, ...)) is to assign hash slices:
(Note that this makes the values all undef.) This technique works particularly well when using hashes as sets, as you are.
A word spoken in Mind will reach its own level, in the objective world, by its own weight
| [reply] [d/l] [select] |
|
Re: How do I efficiently predeclare a hash with a set of keys
by BrowserUk (Patriarch) on May 18, 2007 at 19:19 UTC | |
This is concise
Update: This used to make a difference at some point in the past, but it no longer does (circa. 5.8.8 and possibly before). The null list assignment method others have described my %set; @set{ ... } = (); is equal in memory usage and actually slightly faster. Thanks to bart++ for bringing this to my attention. If you haven't voted on this node yet, please downvote it. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by bart (Canon) on May 19, 2007 at 09:30 UTC | |
? I'd expect them to be comparable. | [reply] [d/l] |
by BrowserUk (Patriarch) on May 19, 2007 at 10:07 UTC | |
Good call. At some point in the past, the undef version used less memory and was faster than the null list assignment version. If I remember correctly, it was tilly that pointed this out to me. However, I just tried it with 5.8.8 and that is no longer the case. They both use the same amount of memory, and the null list assignment is marginally (~4%) faster. It's still faster and uses less memory than the typical method.
Another case of How do I know what I 'know' is right?. :( I'll update my node above to reflect this. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
by lupey (Monk) on May 18, 2007 at 19:42 UTC | |
| [reply] [d/l] |
|
Re: How do I efficiently predeclare a hash with a set of keys
by TGI (Parson) on May 18, 2007 at 22:12 UTC | |
I was going to suggest that depending on the sparseness of your keys, and whether they are all positive integers, it might be a faster to use exists to test an array. Before I made that assertion, I thought I should write a benchmark. The results were that the theoretical optimization is pretty useless. The speeds were basically identical. So, unless someone cares to tell me how badly I screwed up my benchmark, (proper benchmarking is hard, and I don't write them often) I'd have to say "Don't bother with my silly array idea." Update: BrowserUK was kind enough to demonstrate that my benchmarks were flawed and that the arrays are indeed about 37% faster.
And the results
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on May 18, 2007 at 23:29 UTC | |
But the main problem with your benchmark is that by pushing the code under test inside a subroutine, the mechanics of calling the subroutine, unpacking the parameters, and allocating memory to accumulate the results; all combine to entirely swamp the actual thing you are trying to test. Adding a direct test of each of the two possibilities to your benchmark dramatically shows up the cost of all that indirection, as well as confirming your original suspicion that arrays are 37% faster than hashes:
The results:
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by BrowserUk (Patriarch) on May 18, 2007 at 23:17 UTC | |
The first problem is that your arrays are lexical, but you are using strings not sub for your tests. That means that they cannot not be captured as closures. So, you are effectively testing empty arrays and hashes which is why the iteration counts are so high. To test this, insert a print statement inside the first test and run the test for a count of 1:
If you are going to use strings, then your external variables have to be declared with our not my. Doing that and re-running the above sanity check gives:
Running the benchmark once that change is made gives:
Not a whole heap of difference in the percentages, but take a close look at the two orders of magnitude drop in the number of iterations/second. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Re: How do I efficiently predeclare a hash with a set of keys
by Fletch (Bishop) on May 18, 2007 at 19:34 UTC | |
You may be interested in Set::IntSpan from CPAN which will read your sample data format ("5,6,8-53,62-106") directly. | [reply] |
|
Re: How do I efficiently predeclare a hash with a set of keys
by FunkyMonk (Bishop) on May 18, 2007 at 21:42 UTC | |
TIMTOWTDI | [reply] [d/l] |