Re^6: many to many join on text files

Well, it doesn't have to lose any data you need to keep - I originally wrote it as $hash->{$key} = 1, and changed it to an increment to preserve the count; it could just as easily store an arrayref of line numbers or file positions for each key. The core concept though is that the primary super-slow operation being performed is searching through a file repeatedly for matches on a key, and if there is any way to turn that into a hash lookup it's likely to improve things as long as there is memory left.

I think we'd need to see some information about the actual task at hand to know whether there's some reason an in-core hash cannot be used, but unless memory is unusually restricted for a modern-day computer it seems unlikely.

Hugo

Comment on Re^6: many to many join on text files Download Code

Replies are listed 'Best First'.
Re: Re^6: many to many join on text files by tilly (Archbishop) on Apr 15, 2004 at 02:09 UTC
2 files, with a million rows each, and dozens of fields each. Suggests that we are likely to be throwing around a few million objects, many of which are going to take up a few hundred bytes. Even with modern-day computers, this is starting to push what I feel comfortable with assuming is available. As for not having to lose data you intend to keep, I agree that it is possible to do so. I was just pointing out that your code wouldn't in the circumstance described by the original question.	[reply]
Re: Re: Re^6: many to many join on text files by hv (Prior) on Apr 15, 2004 at 02:30 UTC
I'm sorry if I was unclear: I was not at any point suggesting putting the whole records (or whole objects derived from those records) in the hash. Hugo	[reply]