in reply to Count Matches in a File
Many errors; let's treat them piecemeal.
open (CYT, "C:/Work/ING_Occurrences_Companies/CytokineArrays.txt");
What happens it the file can't be opened (file vanished, typo in file name) ? perl doesn't tell if you don't tell it to tell. If the file can't be opened, it is sensible to stop processing, since all later processing doesn't make sense. Use die for that:
open (CYT, "C:/Work/ING_Occurrences_Companies/CytokineArrays.txt") or die "can't read 'C:/Work/ING_Occurrences_Companies/CytokineArra +ys.txt': $!\n";
Now there could be a mismatch between the filename in the open statement and the die; it is better to use a variable. It is good practice to declare a variable as pertinent to the current file (or scope), so my is used here. Declaring the variables enables usage of strict, which will complain about undeclared variables, e.g. typos. Use it always.
use strict; my $cytok_file = "C:/Work/ING_Occurrences_Companies/CytokineArrays.txt +"; open CYT, '<', $cytok_file or die "can't read '$cytok_file': $!\n";
3-argument open lets you see open mode at a glance. See open.
while (<CYT>){ chomp; @cytokine=split(/\t/,$_);}
Here you are assigning the result from split to an array, overwriting it at each pass through the loop, loosing the previous information. You want count occurrences of IDs - use a hash for that. See perldata. Again, use my to declare your lexical scoped variables.
while (<CYT>) { chomp; my ($name, $description, $ID ) = split /\t/; $occurrences{$ID} = 0; # initial count }
split defaults to operate on $_, so that can be omitted. But since $name and $description are never used, it is not necessary to gather them in the first place. You are interested in the third element of the list which split returns, so grab that (index starts with 0):
while (<CYT>) { chomp; my $ID = (split /\t/)[2]; $occurrences{$ID} = 0; # initial count }
or even
while (<CYT>) { chomp; $occurrences{ (split /\t/)[2] } = 0; # initial count }
although the latter might be too terse, since it doesn't give a clue anymore about what that third element is.
close CYT;
Again, it is sensible to check the return value of a system call:
close CYT or die "can't close filehandle CYT properly: $!\n";
In the next 2 lines, you are opening files for reading:
open (OUT, ">C:/Work/ING_Occurrences_Companies/ING_Count.txt"); open (IN, "C:/Work/ING_Occurrences_Companies/ING.txt");
Perl can't tell your intention from the filehandle name. Again, use variables for your file names.
my $outfile = "C:/Work/ING_Occurrences_Companies/ING_Count.txt"; my $infile = "C:/Work/ING_Occurrences_Companies/ING.txt"; open OUT, '>', $outfile or die "can't write '$outfile': $!\n"; open IN, '<', $infile or die "can't read '$infile': $!\n";
Vertical alignment of common element on consecutive lines makes your code more readable (as does proper indenting). From the next block
while (<IN>){ chomp; @ING=split(/\t/,$_); $count = 0; if($ING[2] =~ /@cytokine[1]/){ $count++; print OUT "$ING[0]\t$ING[1]\t$cytokine{$ING[2]}\t$count\t\n"; }}
I deduce that the format of the input file is identical to the first file read, and that the ID is in the 3rd field. Just grab the ID as a key to the hash %occurrences and increment the value stored there. Store the line in another hash, keyed also keyed on the ID -
my %lines; while (<IN>) { chomp; my $ID = (split /\t/)[2]; $ocurrences{$ID}++; $lines{$ID} = $_; }
- then sort the keys of %ocurrences, iterate over them and output your data:
foreach my $ID (sort keys %ocurrences) { print $lines{$ID}, "\t", $ocurrences{$ID}, "\n"; }
Depending of the type of your IDs you might want to sort them numerically. See sort.
|
|---|