I don't know how big file2 is. I guess file 1 is the big one with all the data. An alternate approach would be to process file 2 a bit more, perhaps make a hash structure with these id's as keys and values either a hash or array with the cluster number and place for data from file 1. Read file 1 sequentially and save the data if needed in the structure that describes file 2. Then print structure when finished reading file 1.
You are reading file1 all the way thru and then seeking around in it and structure to describe seek positions I assume is gonna be pretty big. Anyway if the data you need from file 1 fits into memory (the stuff that file 2 wants to know), this approach is faster because you don't have to do all the seeking around which is a fairly slow operation. Just an idea. | [reply] |