in reply to Re: count total number of occurrence in all files
in thread Reaped: count total number of occurrence in all files

sorry friends, the problem is that my file consists of 4 lines but only the second line should match and give the total count. For example the following two files

data.txt @gi AGATC + E/AA# 1 @gi1 ACCTA + /66AE 3
data1.txt @gi AGATC + //AA# 2 @gi1 ACCTA + #66AE 5
The output should be:
@gi AGATC + E/AA# 3 @gi1 ACCTA + /66AE 8
It should sum the second column of both file only if second line matches. but it is giving the output as:
@gi AGATC + E/AA# 1 @gi1 ACCTA + /66AE 3 @gi AGATC + //AA# 2 @gi1 ACCTA + #66AE 5
It is comparing all the four lines. My script is the same:
my %compare; $/=""; while (<>) { chomp; my ( $key, $value ) = split('\t\s', $_); push( @{ $compare{$key} }, $value ); } foreach my $key ( sort keys %compare ) { my $tot = 0; my $file_count = @ARGV; for my $val ( @{$compare{$key}} ) { $tot += ( split /:/, $val )[0]; } if ( @{ $compare{$key} } >= $file_count) { print join( "\t", $key, $tot, @{ $compare{$key} } ), "\n\n"; } }

Replies are listed 'Best First'.
Re^3: count total number of occurrence in all files
by BillKSmith (Monsignor) on May 10, 2016 at 11:58 UTC

    Thank you for accepting Discipulus's advice about posting readable examples.

    Your new question implies that it is possible for two records to have matching sequences (second line), but different ID's (first line). Neither your example nor your code tell us what output you expect in this case. (If this is not possible, you should match only on the much shorter first line.)

    I see that your code uses my suggestion for parsing the fourth line. Unfortunately, this does not work for your new data files (There is no colon to split on). It is not likely that we can tell you what is wrong with your code until you post code and data that allow us to reproduce your results.

    Bill