in reply to Re^2: modification of the script to consume less memory with higher speed
in thread modification of the script to consume less memory with higher speed

You appear to keep the first record that is seen, in full, while subsequent matching records are only tallied by their count. In that right?

Now, the remaining question is, do you want the output records to keep the order in which they are processed, or is it acceptable if they appear in random order?

If any output order will do, then the simplest way to process your job is to divide it up in parts. For example, you can dump the records in temporary files, according to first few letters of the key. Lets say, the intermediate files are TACA.tmp, CATT.tmp, AGAT.tmp, etc. After that, process each temp. file individually, appending the output to final result. Questions?

  • Comment on Re^3: modification of the script to consume less memory with higher speed

Replies are listed 'Best First'.
Re^4: modification of the script to consume less memory with higher speed
by Anonymous Monk on Jul 30, 2016 at 06:22 UTC

    I am sorry but I am unable to follow your suggestion as I am a beginner in perl. It would be helpful if you could explain it with example or modification in my script if possible. The output records is acceptable in a random order but the complete second line should match in all files and the count is given accordingly.

      PerlMonks is not a code-writing service.

      The script you have is fine as it is (if it works), what you need is another script that first divides the job so that it becomes manageable. Keys that have differing beginnings can never match, therefore, partitioning the records by their start serves to effectively reduce the job into many smaller jobs.

      What part of the suggestion are you struggling with? For starters, you could try to work out a script that reads the records and dumps them on the screen, together with a note "this record must go in that file". The problem you have is a good learning opportunity, as it can be readily broken down into smaller sub-tasks that a beginner can handle.