in reply to Re: Re: Re: Re: Re: many to many join on text files
in thread many to many join on text files

Well, it doesn't have to lose any data you need to keep - I originally wrote it as $hash->{$key} = 1, and changed it to an increment to preserve the count; it could just as easily store an arrayref of line numbers or file positions for each key. The core concept though is that the primary super-slow operation being performed is searching through a file repeatedly for matches on a key, and if there is any way to turn that into a hash lookup it's likely to improve things as long as there is memory left.

I think we'd need to see some information about the actual task at hand to know whether there's some reason an in-core hash cannot be used, but unless memory is unusually restricted for a modern-day computer it seems unlikely.

Hugo

Replies are listed 'Best First'.
Re: Re^6: many to many join on text files
by tilly (Archbishop) on Apr 15, 2004 at 02:09 UTC
    2 files, with a million rows each, and dozens of fields each. Suggests that we are likely to be throwing around a few million objects, many of which are going to take up a few hundred bytes.

    Even with modern-day computers, this is starting to push what I feel comfortable with assuming is available.

    As for not having to lose data you intend to keep, I agree that it is possible to do so. I was just pointing out that your code wouldn't in the circumstance described by the original question.

      I'm sorry if I was unclear: I was not at any point suggesting putting the whole records (or whole objects derived from those records) in the hash.

      Hugo