in reply to Efficiency and Large Arrays

A side note to those who recommend sorting first - yes, it will help immensely if you are able to do so first, but keep in mind that the sorting itself will be fairly expensive, especially for files this size. In addition, a simple unix sort will not be up to the task (without some further trickery) as each record is in multiple lines (and spans 1+ files) and the sort keys may be in different spots in each record. Not a challenge for perl, of course, but it probaby cancels out any speed benefit of having it pre-sorted.

Replies are listed 'Best First'.
Presort's Cost
by gryng (Hermit) on Jul 26, 2000 at 17:46 UTC
    Yes it would be "expensive" but if you don't have the memory to keep all the records in, then sort is not only the faster way to go, it's the only sane way to go. Sorting doesn't require large amounts of memory (though simple sorting would), and perl could do the presorting itself. Anyway for small files presorting would be a waste of time-energy and time-cpu. However as the input files grow, simply sorting your input beforehand can do wonders.

    Ciao,
    Gryn

      Yes, simple sorting requires a lot of memory, but even a more complex sort requires that at the very least you read each record once. Why not just grab the information while you are there, as in my example program above? I agree that a sorted file is far better, and you'd probably want to get it sorted at some point, but I would not call it the only "sane" way! :)

        Oh yeah of course sorting is an expensive operation (but not too expensive). However, what I said was that if you didn't have enough memory to keep all of the records in then sorting is the only sane way to do it because otherwise you will either start thrashing or repeating work. I never do the sorting unless I start processing +1 meg files, and even then it depends on whether I care :) .

        Cheers,
        Gryn