in reply to Re^2: In-place sort with order assignment
in thread In-place sort with order assignment

It appears to me (perhaps wrongly so) that the problem reduces down to how to print all keys of a hash without Perl making a new list of all keys of that hash in order for the print to work? Assumption is that doubling the size of the storage for key values will exceed physical memory.

In a more general sense: how to call a sub for each hash entry as the hash table is traversed.

If that can be done, then output all keys to a file (call a print routine for each entry). Call a system sort routine for that file. The Perl program may be paged out. Many files may be created and lots of memory may be consumed, but when that sort finishes, there is a file that is in hash key order and all the physical memory that system sort used is now free.

Perl program reads that big file (millions of lines) and assigns sequential numbers to each entry.

Why wouldn't that work?

  • Comment on Re^3: In-place sort with order assignment

Replies are listed 'Best First'.
Re^4: In-place sort with order assignment
by dasgar (Priest) on Sep 20, 2010 at 00:48 UTC

    I've never really tried to optimize code to minimize memory usage, so my thoughts here might be stupid and/or crazy. I'll go ahead and risk being ridiculed and toss out my ideas in case they might spark a better idea from a more experienced programmers.

    Marshall, your idea of sorting from a file is close to an idea that I had, but was very hesitant to put it in a post. However, it seems to me that sorting the file(s) as you suggest could potentially eat up a lot of memory. I admit that I could be dead wrong about that.

    Here's my stupid/crazy idea that's close to what Marshall suggested:

    • Loop through the unsorted keys of the hash.
    • For each key in the hash:
      • Open a file in inline edit mode
      • Loop through each line and insert the new key in the proper line (one hash key per line) based on desired sort method
    • After doing this for each hash key, the file above should have the keys in sorted order. Reopen the file. While looping through that file, you'll be progressing through a sorted list of the keys.

    In other words, instead of doing the sorting after populating the file with all of the hash keys, do the sorting one element at a time as each hash key is added to the file.

    I believe that this would sort the keys with minimal memory usage. However, execution time might not be that great or even take too long. Since BrowserUK said that "Speed is not a huge priority here", this might be acceptable depending on how long it takes.

    As I said, I have no experience optimizing for minimal memory usage, which means that this could be a horrible idea. I'm open to constructive criticism on this idea, which will help me learn more about optimizing.

      In other words, instead of doing the sorting after populating the file with all of the hash keys, do the sorting one element at a time as each hash key is added to the file.
      From what I understand, a huge hash structure already exists and foreach keys %hash makes a list of the hash keys, which essentially doubles the amount of memory required. My question is how to spew all of the keys into a file without making an intermediate structure that contains all of the keys. I suspect that there is a way to do that. If so, the the sort part belongs to another process that will release its memory when done. The Perl hash table assignments of 1,2,3,4 will cause %hash to grow, but only as much as needed and presumably less than 2*storage required for the keys.

        Ah, figured there was something that I either missed or didn't understand. I've never really tried looking under the hood to understand fully what Perl is doing behind the scenes. I didn't realize that the foreach command would make a copy of the data structure.

        Now that you've enlightened me about this, I realize how foolish my idea was. And that leads me back to thinking along the lines of what you had said in the post that I responded too. (By the way, thanks for teaching me something new. I sincerely appreciate you kindly pointing out something that I overlooked.)

           My question is how to spew all of the keys into a file without making an intermediate structure that contains all of the keys.

        Well, here's a thought on that. I'm assuming that there had to have been some Perl code that created the keys in the hash. If you have access to modify that code, modify it so that it's printing to a file instead creating the hash. That leaves the unsorted keys in a file and no initial hash. That in turn frees up more memory for a sorting method, which can be applied as the keys are written to file and/or after all keys have been written to file.

        In other words, "extract" the keys before the hash is created and then do the sort. Then after the sorting is complete, create the hash.

        Of course, if the code where the hash keys are added cannot be modified for some reason, the above idea can't be implemented.

        Does that sound like a reasonable idea or have I missed something else due to my lack of knowledge and experience?

Re^4: In-place sort with order assignment
by BrowserUk (Patriarch) on Sep 20, 2010 at 07:53 UTC
    Why wouldn't that work?

    It would.

    But whilst memory rather than speed was the focus; solutions that avoid writing millions of lines to a file; sorting (which itself re-reads and re-writes all those lines, often several times); and the re-reading them all again; are likely to be considerably faster. Hence my preference for an 'internal' solution.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.