de2425 has asked for the wisdom of the Perl Monks concerning the following question:
I am desperately trying to figure out how to accomplish a match count between two files. I'm sure that this is something that is very simple but as a novice I am failing miserably at it.
What I have is two files with data within them and I need to count matches of specific values between them. The first file one might consider a master list of sorts. What I'm wanting is to take the numeric list from that file and count how many times each number occurs within the second file. I would like to then print out the Alpha numeric data from the master list that is associated with the numeric data along with a count of how many times that item occurs in the second file.
The data from the first file looks approximately like:
Name, description, ID#
The data from the second file has the same type of data but the lists is shorter and there are several lists. It looks like:
Name, description, ID#, Name, description, ID#,Name, description, ID#
The Output I'm looking for is:
Name, ID#, #of times matched
I have tried several different things but have not had success with it at all. My sample code is below. If anyone could offer any constructive help, I would very much appreciate it
#!/usr/bin/perl -w open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt"); while (<IN>){ chomp; @t=split(/\t/,$_); $ING{$t[9]}=$t[1]; #print %ING; } close IN; open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "I'm not dead ye +t"; open (IN, "c:/work/Cytokine_By_Company/CytokineArrays.txt")||die "I'm +Dead!!!!"; while(<IN>){ chomp; @cytokine=split(/\t/,$_); while (/\d+/ and exists $ING{$cytokine}){ $count++;} foreach $cytokine (sort{$ING{$b}<=>$ING{$a}} keys $ING){ print OUT "$ING{cytokine}\t$count\n";} } close IN; close OUT;
Thanks to everyone for their help. Between all of your comments and some thinking, I finally got the code to generate what I needed it to generate. I also understand your comments about declaring my variables, however, the person I'm working under gets very frustrated with me when I do this. Please don't ask me why. Therefore, I leave them out. Anyway, my new code looks like this:
#!/usr/bin/perl -w #open(OUT, ">c:/work/new_list.txt"); open (IN, "c:/work/GeneID_Count/CytokineList.txt")||die "Could not ope +n Cytokine Arrays.txt"; %seen = (); while(<IN>){ chomp; @cytokine=split(/\t/,$_); $seen{$cytokine[0]}++; } close IN; open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "Cound not creat +e Cytokine.txt"; open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt")|| +die "Could not open ING_cytokines_20080805.txt"; while (<IN>){ chomp; @ING=split(/\t/,$_); if ($ING[9]=~/\d/ and exists $seen{$ING[9]}){ print OUT "$ING[1]\t$ING[9]\t$seen{$ING[9]}\n"; } } close IN; close OUT;
Thank you all again for all of your help.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Match Count
by shmem (Chancellor) on Sep 08, 2008 at 15:37 UTC | |
|
Re: Match Count
by moritz (Cardinal) on Sep 08, 2008 at 14:35 UTC | |
|
Re: Match Count
by apl (Monsignor) on Sep 08, 2008 at 15:10 UTC | |
|
Re: Match Count
by dwm042 (Priest) on Sep 08, 2008 at 16:15 UTC |