Re: count total number of occurrence in all files

When I download your data files, I find a title and blank line at the start of each. I removed them. Your processing requires a blank line after every record. This includes the last record of every file. I inserted a blank line at the end of each file which did not have one. With these changes, your code created the hash correctly.

Your code does not extract the numeric field from the hash value. With the following change, your code produced the output that you expect.

#       $tot += $val;
        $tot += ( split /:/, $val )[0];
[download]

Bill

Comment on Re: count total number of occurrence in all files Download Code

Replies are listed 'Best First'.
Re^2: count total number of occurrence in all files by Anonymous Monk on May 10, 2016 at 05:59 UTC
Thank you everyone...now the problem is the script is working perfectly well with small files as shown in example but when I run the script with larger files of 4GB, the script does not give the total count it is giving the count of only first file. why is it happening this way	[reply]
Re^2: count total number of occurrence in all files by Anonymous Monk on May 10, 2016 at 07:35 UTC
sorry friends, the problem is that my file consists of 4 lines but only the second line should match and give the total count. For example the following two files `data.txt @gi AGATC + E/AA# 1 @gi1 ACCTA + /66AE 3` [download] `data1.txt @gi AGATC + //AA# 2 @gi1 ACCTA + #66AE 5` [download] The output should be: `@gi AGATC + E/AA# 3 @gi1 ACCTA + /66AE 8` [download] It should sum the second column of both file only if second line matches. but it is giving the output as: `@gi AGATC + E/AA# 1 @gi1 ACCTA + /66AE 3 @gi AGATC + //AA# 2 @gi1 ACCTA + #66AE 5` [download] It is comparing all the four lines. My script is the same: `my %compare; $/=""; while (<>) { chomp; my ( $key, $value ) = split('\t\s', $_); push( @{ $compare{$key} }, $value ); } foreach my $key ( sort keys %compare ) { my $tot = 0; my $file_count = @ARGV; for my $val ( @{$compare{$key}} ) { $tot += ( split /:/, $val )[0]; } if ( @{ $compare{$key} } >= $file_count) { print join( "\t", $key, $tot, @{ $compare{$key} } ), "\n\n"; } }` [download]	[reply] [d/l] [select]
Re^3: count total number of occurrence in all files by BillKSmith (Monsignor) on May 10, 2016 at 11:58 UTC
Thank you for accepting Discipulus's advice about posting readable examples. Your new question implies that it is possible for two records to have matching sequences (second line), but different ID's (first line). Neither your example nor your code tell us what output you expect in this case. (If this is not possible, you should match only on the much shorter first line.) I see that your code uses my suggestion for parsing the fourth line. Unfortunately, this does not work for your new data files (There is no colon to split on). It is not likely that we can tell you what is wrong with your code until you post code and data that allow us to reproduce your results. Bill	[reply]