Re^3: In-place sort with order assignment
by Marshall (Canon) on Sep 19, 2010 at 23:17 UTC
|
It appears to me (perhaps wrongly so) that the problem reduces down to how to print all keys of a hash without Perl making a new list of all keys of that hash in order for the print to work? Assumption is that doubling the size of the storage for key values will exceed physical memory.
In a more general sense: how to call a sub for each hash entry as the hash table is traversed.
If that can be done, then output all keys to a file (call a print routine for each entry). Call a system sort routine for that file. The Perl program may be paged out. Many files may be created and lots of memory may be consumed, but when that sort finishes, there is a file that is in hash key order and all the physical memory that system sort used is now free.
Perl program reads that big file (millions of lines) and assigns sequential numbers to each entry.
Why wouldn't that work?
| [reply] |
|
|
I've never really tried to optimize code to minimize memory usage, so my thoughts here might be stupid and/or crazy. I'll go ahead and risk being ridiculed and toss out my ideas in case they might spark a better idea from a more experienced programmers.
Marshall, your idea of sorting from a file is close to an idea that I had, but was very hesitant to put it in a post. However, it seems to me that sorting the file(s) as you suggest could potentially eat up a lot of memory. I admit that I could be dead wrong about that.
Here's my stupid/crazy idea that's close to what Marshall suggested:
- Loop through the unsorted keys of the hash.
- For each key in the hash:
- Open a file in inline edit mode
- Loop through each line and insert the new key in the proper line (one hash key per line) based on desired sort method
- After doing this for each hash key, the file above should have the keys in sorted order. Reopen the file. While looping through that file, you'll be progressing through a sorted list of the keys.
In other words, instead of doing the sorting after populating the file with all of the hash keys, do the sorting one element at a time as each hash key is added to the file.
I believe that this would sort the keys with minimal memory usage. However, execution time might not be that great or even take too long. Since BrowserUK said that "Speed is not a huge priority here", this might be acceptable depending on how long it takes.
As I said, I have no experience optimizing for minimal memory usage, which means that this could be a horrible idea. I'm open to constructive criticism on this idea, which will help me learn more about optimizing.
| [reply] |
|
|
In other words, instead of doing the sorting after populating the file with all of the hash keys, do the sorting one element at a time as each hash key is added to the file.
From what I understand, a huge hash structure already exists and foreach keys %hash makes a list of the hash keys, which essentially doubles the amount of memory required. My question is how to spew all of the keys into a file without making an intermediate structure that contains all of the keys. I suspect that there is a way to do that. If so, the the sort part belongs to another process that will release its memory when done. The Perl hash table assignments of 1,2,3,4 will cause %hash to grow, but only as much as needed and presumably less than 2*storage required for the keys.
| [reply] [d/l] |
|
|
|
|
Why wouldn't that work?
It would.
But whilst memory rather than speed was the focus; solutions that avoid writing millions of lines to a file; sorting (which itself re-reads and re-writes all those lines, often several times); and the re-reading them all again; are likely to be considerably faster. Hence my preference for an 'internal' solution.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re^3: In-place sort with order assignment
by Limbic~Region (Chancellor) on Sep 19, 2010 at 14:48 UTC
|
BrowserUk,
You didn't comment on my multi-pass each approach. It is an ugly solution but there is no reason I can think of that it wouldn't work.
| [reply] |
|
|
| [reply] |
|
|
BrowserUk,
Hrm. It sounds really close if not identical to the one described by Corion. If I didn't have kids running around screaming "Daddy Daddy", I would code up a solution but a better description will have to do.
I am confident that, in addition to my hash, I can have two additional arrays with 1000 items each. On the first pass through the hash using each, you check to see if the current key in the hash is lt than the first element in the array. If yes, you unshift, if not you check the next item until you find the proper location and use splice. If you reach the end of the array and it is less than 1000, you push. At the end, you check if you have 1001 items and you pop. At the end of the first pass, you now know the first 1000 items. You begin your second pass. This time, when you encounter an item in your first array, you assign its appropriate value while simultaneously populating the 2nd array (using the last element of the first).
Now of course this is silly - there are much better data structures than an array but I hope you get the idea.
| [reply] |
|
|
|
|
|