data1.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC + =CCGGGCGGG1GGJJCGJJCJJJCJJGGGJJGJGJJJCG8JGJJJJ1JGG8=JGCJGG$G 1 :da +ta1.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC + CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJ>JJJGGG8$CGJJGGCJ8JJ 3 :da +ta1.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG + CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJJJJJGGG8$CGJJGGCJ8JJ 2 :da +ta1.txt
I want to sum the occurrence of read in all the files if the second line of each read matches i.e the output for the above two files should come like this:data2.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC + AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA</AE<EE 1 :data +2.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC + AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE 2 :dat +a2.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG + AAAAA#EEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEAE6AE<EAEEAE 2 :da +ta2.txt
My code is working with file upto 10GB but if the file exceed this size it hangs. I want my script to run any size of file. Any help will be appreciated.@NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTC + =CCGGGCGGG1GGJJCGJJCJJJCJJGGGJJGJGJJJCG8JGJJJJ1JGG8=JGCJGG$G 1 :da +ta1.txt 1 :data2.txt count:2 @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGC + CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJ>JJJGGG8$CGJJGGCJ8JJ 3 :da +ta1.txt 2 :data2.txt count:5 @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGG + CCCGGGGGGGGGGJGJJJJJJJJJJJJJGJG$JJJJ$GGJJJJJGGG8$CGJJGGCJ8JJ 2 :da +ta1.txt 2 :data2.txt count:4
In reply to Re^2: modification of the script to consume less memory with higher speed
by Anonymous Monk
in thread modification of the script to consume less memory with higher speed
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |