umaykulsum has asked for the wisdom of the Perl Monks concerning the following question:

I have a script to calculate total number of occurrence of a read in fastq format in all files:
#!/usr/bin/env perl use strict; use warnings; my %compare; $/=""; while (<>) { chomp; my ( $key, $value ) = split('\t\s', $_); push( @{ $compare{$key} }, $value ); } foreach my $key ( sort keys %compare ) { my $tot = 0; for my $val ( @{$compare{$key}} ) { $tot += $val; } if ( @{ $compare{$key} } >= @ARGV) { print join( "\t", $key, $tot, @{ $compare{$key} } ), "\n\n"; } }

for example I have 3 files data.txt,data1.txt,data2.txt which has the count of the occurrence of a read. The output should give the total number of occurrence in all the files as shown in output.txt.

data.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTCTGCTTGAAAA +AAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG + AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA<A/AE<EE/EEAEEAEEAE +EEEA///EEEEEEEEEAEEEEEEEEEEEEEEEEEEEE/EEEAEEEAEEEEEEEEEAEAEEEEEEEEEEE +EAEEEEEAEEAA 1:data.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGCGCGGATTTGT +TGTGATGTGGCAGCGCTCTGGCAGATTGCTACATGCGCAACATCTACCAGTTTACTTAACTGACTAAAC +AGTAAGTCGACC + AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE/E6EEEEEEEE +E6EEEE<EAEEEE<</EAEE6<<EEEEEA/AEAE<AA<E6A<E/EEE<EAEEEEAAEEE<AAE<EEEE6 +A6AEA<A6//66 3:data.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGGATGGCGCTGT +TAATCGCAGCAATGGTGTATCCGCAGGGGATTTTTCCGGTACTGGCAGCGTCCGGCGTTTGGGTAGAGA +TCGGAAGAGCAC + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEAEEEEAEEAEEEE +AEEEA//EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE +EEAAAEAEEEEA 2:data.txt
data1.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTCTGCTTGAAAA +AAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG + AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA<A/AE<EE/EEAEEAEEAE +EEEA///EEEEEEEEEAEEEEEEEEEEEEEEEEEEEE/EEEAEEEAEEEEEEEEEAEAEEEEEEEEEEE +EAEEEEEAEEAA 1:data1.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGCGCGGATTTGT +TGTGATGTGGCAGCGCTCTGGCAGATTGCTACATGCGCAACATCTACCAGTTTACTTAACTGACTAAAC +AGTAAGTCGACC + AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE/E6EEEEEEEE +E6EEEE<EAEEEE<</EAEE6<<EEEEEA/AEAE<AA<E6A<E/EEE<EAEEEEAAEEE<AAE<EEEE6 +A6AEA<A6//66 3:data1.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGGATGGCGCTGT +TAATCGCAGCAATGGTGTATCCGCAGGGGATTTTTCCGGTACTGGCAGCGTCCGGCGTTTGGGTAGAGA +TCGGAAGAGCAC + AAAAA#EEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEAE6AE<EAEEAEAAEEAEEEEE +EEAE/EEAEEAEEE6EEEEEAE6A/E<EEEEEEEEAE<EEEEEA/AEEAAEEEEEE//AEE/<<<EEAE +<66/</AE<<A6 2:data1.txt
data2.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTCTGCTTGAAAA +AAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG + AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA<A/AE<EE/EEAEEAEEAE +EEEA///EEEEEEEEEAEEEEEEEEEEEEEEEEEEEE/EEEAEEEAEEEEEEEEEAEAEEEEEEEEEEE +EAEEEEEAEEAA 1:data2.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGCGCGGATTTGT +TGTGATGTGGCAGCGCTCTGGCAGATTGCTACATGCGCAACATCTACCAGTTTACTTAACTGACTAAAC +AGTAAGTCGACC + AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE/E6EEEEEEEE +E6EEEE<EAEEEE<</EAEE6<<EEEEEA/AEAE<AA<E6A<E/EEE<EAEEEEAAEEE<AAE<EEEE6 +A6AEA<A6//66 2:data2.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGGATGGCGCTGT +TAATCGCAGCAATGGTGTATCCGCAGGGGATTTTTCCGGTACTGGCAGCGTCCGGCGTTTGGGTAGAGA +TCGGAAGAGCAC + AAAAA#EEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEAE6AE<EAEEAEAAEEAEEEEE +EEAE/EEAEEAEEE6EEEEEAE6A/E<EEEEEEEEAE<EEEEEA/AEEAAEEEEEE//AEE/<<<EEAE +<66/</AE<<A6 2:data2.txt
output.txt @NS500278 AGATCNGAAGAGCACACGTCTGAACTCCAGTCACAACGTGATATCTCGTATGCCGTCTTCTGCTTGAAAA +AAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGG + AAAAA#EEEEEEEEEEEEEEEE6EEEEEAEEEAE/AEEEEEEEAE<EEEEA<A/AE<EE/EEAEEAEEAE +EEEA///EEEEEEEEEAEEEEEEEEEEEEEEEEEEEE/EEEAEEEAEEEEEEEEEAEAEEEEEEEEEEE +EAEEEEEAEEAA 3 1:data.txt 1:data1.txt 1:data2.txt @NS500278 CATTGNACCAAATGTAATCAGCTTTTTTCGTCGTCATTTTTCTTCCTTTTGCGCTCAGGCGCGGATTTGT +TGTGATGTGGCAGCGCTCTGGCAGATTGCTACATGCGCAACATCTACCAGTTTACTTAACTGACTAAAC +AGTAAGTCGACC + AAAAA#E/<EEEEEEEEEEAEEEEEEEEA/EAAEEEEEEEEEEEE/EEEE/A6<E<EEE/E6EEEEEEEE +E6EEEE<EAEEEE<</EAEE6<<EEEEEA/AEAE<AA<E6A<E/EEE<EAEEEEAAEEE<AAE<EEEE6 +A6AEA<A6//66 8 3:data.txt 3:data1.txt 2:data2.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGGATGGCGCTGT +TAATCGCAGCAATGGTGTATCCGCAGGGGATTTTTCCGGTACTGGCAGCGTCCGGCGTTTGGGTAGAGA +TCGGAAGAGCAC + AAAAA#EEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEE/EEEAE6AE<EAEEAEAAEEAEEEEE +EEAE/EEAEEAEEE6EEEEEAE6A/E<EEEEEEEEAE<EEEEEA/AEEAAEEEEEE//AEE/<<<EEAE +<66/</AE<<A6 4 2:data1.txt 2:data2.txt @NS500278 TACAGNGAGCAAACTGAAATGAAAAAGAAATTAATCAGCGGACTGTTTCTGATGTTATGGATGGCGCTGT +TAATCGCAGCAATGGTGTATCCGCAGGGGATTTTTCCGGTACTGGCAGCGTCCGGCGTTTGGGTAGAGA +TCGGAAGAGCAC + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEAEEEAEEEEAEEAEEEE +AEEEA//EEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE +EEAAAEAEEEEA 2 2:data.txt

Replies are listed 'Best First'.
Re: count total number of occurrence in all files
by BillKSmith (Monsignor) on May 07, 2016 at 13:21 UTC

    Your input loop is exactly like the example for the special case of <> in I/O Operators. You must change your code to handle multiple files differently. We cannot help you until you tell us exactly what you do expect.

    UPDATE: In your existing code, @ARGV is will be zero regardless of how many files you specify. They are all shifted off by <>. You probably want to save it before you start the loop.

    Bill
Re: count total number of occurrence in all files
by Laurent_R (Canon) on May 07, 2016 at 08:27 UTC
    Hi umaykulsum,

    I am sorry, but I do not really understand what your code is doing or trying to do. Can you please show some input data? And also the expected output for that input?

    And please explain the content of the %compare hash which never appears to get populated in the code you are showing.