Re: count total number of occurrence in all files
by Discipulus (Canon) on May 09, 2016 at 06:58 UTC
|
hello umaykulsum
As you do not ask for something nor point to particular errors i can only guess:
first i cannot understand how your data sample can be splitted using \t\s pattern (ah ok you are setting $/ for this..).
second i do not see where you are opening files (maybe the script is run under -nl ?)
If each file just contains 2 lines you can simply do:
my $key = <$filehandle>; # first line
my $value = <$filehandle>; # second line
You also have no need to iterate over an array to get the count: @array in scalar context returns the numbers of elements.
Also $#array contains the index of the last element ( so $#array + 1 == scalar @array ).
You also want the name of the file to be preserved somehow: you can read about $ARGV or you must create a sub that given the filename as arg, open it process it and store the filename along results.
A pray to you and to ALL BIONFORMATICS here around: sample data can be AGATC instead of a line of hundreds of chars? have readable data as sample help my eyes a lot!
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] [select] |
|
|
Hello Discipulus :). Actually there is no need for -n, because that's what while (<>) { } does (and -n just adds that loops around your code, as can be seen with -MO=Deparse). It opens each file passed as an argument in the filehandle ARGV, or reads from STDIN is @ARGV == 0.
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
Re: count total number of occurrence in all files
by Eily (Monsignor) on May 09, 2016 at 09:31 UTC
|
Hello umaykulsum, one problem I see with your code is that while (<>) shifts (removes) the arguments from @ARGV. So when you are done reading, the size of @ARGV is 0. You can save the count before reading: my $file_count = @ARGV;
Another issue is that single quote prevent interpolation. Try print '\t\s';, you'll see that you don't get a tab and a space, but the string \t\s. You want to use double quotes instead. You either want to split on literal string, like "\t " or use a pattern in a regex like /\t\s/
You can optimize your code by doing more work in one loop (untested):
while (<>)
{
chomp;
my ( $key, $value ) = split /\t\s/, $_; # Edited thanks to Anomalous
+Monk
$compare{$key}{count}++;
$compare{$key}{sum} += $value;
if ($file_count == $compare{$key}{count})
{
print "$key: ", $compare{$key}{sum};
}
}
Edit: "\t\s" doesn't work, see AnomalousMonk's answer below :) | [reply] [d/l] [select] |
|
|
Another issue is that single quote prevent interpolation. ... You want to use double quotes instead.
I don't understand the point you're making here. Using double-quote interpolation produces the string "\ts", which doesn't seem to be what the OPer wants to split on at all and which earns you an "Unrecognized escape ..." warning into the bargain. Indeed, in your example code, you use the '\t\s' (a single tab character followed by any single whitespace character) as the split pattern. (Personally, I prefer to use qr// or m// instead. Potayto, potahto.)
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my $line = qq{foo\t bar\t\tbaz\t\nboff};
print qq{>>$line<<};
dd $line;
;;
my @fields = split '\t\s', $line;
dd \@fields;
"
>>foo bar baz
boff<<
"foo\t bar\t\tbaz\t\nboff"
["foo", "bar", "baz", "boff"]
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my $line = qq{foo\t bar\t\tbaz\t\nboff};
print qq{>>$line<<};
dd $line;
;;
my @fields = split qq{\t\s}, $line;
dd \@fields;
"
Unrecognized escape \s passed through at -e line 1.
>>foo bar baz
boff<<
"foo\t bar\t\tbaz\t\nboff"
["foo\t bar\t\tbaz\t\nboff"]
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
You're right, \s doesn't make sense outside a regex, I read it as a single space character, which it actually is not. The fact that I split on '\t\s' is just a copy/paste gone wrong because I did mean to write "\t\s", which would have been incorrect anyway. I'll edit my post, thanks!
| [reply] |
Re: count total number of occurrence in all files
by BillKSmith (Monsignor) on May 09, 2016 at 15:22 UTC
|
When I download your data files, I find a title and blank line at the start of each. I removed them. Your processing requires a blank line after every record. This includes the last record of every file. I inserted a blank line at the end of each file which did not have one. With these changes, your code created the hash correctly.
Your code does not extract the numeric field from the hash value. With the following change, your code produced the output that you expect.
# $tot += $val;
$tot += ( split /:/, $val )[0];
| [reply] [d/l] |
|
|
data.txt
@gi
AGATC
+
E/AA# 1
@gi1
ACCTA
+
/66AE 3
data1.txt
@gi
AGATC
+
//AA# 2
@gi1
ACCTA
+
#66AE 5
The output should be:
@gi
AGATC
+
E/AA# 3
@gi1
ACCTA
+
/66AE 8
It should sum the second column of both file only if second line matches.
but it is giving the output as:
@gi
AGATC
+
E/AA# 1
@gi1
ACCTA
+
/66AE 3
@gi
AGATC
+
//AA# 2
@gi1
ACCTA
+
#66AE 5
It is comparing all the four lines. My script is the same:
my %compare;
$/="";
while (<>) {
chomp;
my ( $key, $value ) = split('\t\s', $_);
push( @{ $compare{$key} }, $value );
}
foreach my $key ( sort keys %compare ) {
my $tot = 0;
my $file_count = @ARGV;
for my $val ( @{$compare{$key}} ) {
$tot += ( split /:/, $val )[0];
}
if ( @{ $compare{$key} } >= $file_count) {
print join( "\t", $key, $tot, @{ $compare{$key} } ), "\n\n";
}
}
| [reply] [d/l] [select] |
|
|
Thank you for accepting Discipulus's advice about posting readable examples.
Your new question implies that it is possible for two records to have matching sequences (second line), but different ID's (first line). Neither your example nor your code tell us what output you expect in this case. (If this is not possible, you should match only on the much shorter first line.)
I see that your code uses my suggestion for parsing the fourth line. Unfortunately, this does not work for your new data files (There is no colon to split on). It is not likely that we can tell you what is wrong with your code until you post code and data that allow us to reproduce your results.
| [reply] |
|
|
| [reply] |