comment on

I am desperately trying to figure out how to accomplish a match count between two files. I'm sure that this is something that is very simple but as a novice I am failing miserably at it.

What I have is two files with data within them and I need to count matches of specific values between them. The first file one might consider a master list of sorts. What I'm wanting is to take the numeric list from that file and count how many times each number occurs within the second file. I would like to then print out the Alpha numeric data from the master list that is associated with the numeric data along with a count of how many times that item occurs in the second file.

The data from the first file looks approximately like:

Name, description, ID#

The data from the second file has the same type of data but the lists is shorter and there are several lists. It looks like:

Name, description, ID#, Name, description, ID#,Name, description, ID#

The Output I'm looking for is:

Name, ID#, #of times matched

I have tried several different things but have not had success with it at all. My sample code is below. If anyone could offer any constructive help, I would very much appreciate it


#!/usr/bin/perl -w

open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt");

     while (<IN>){
            chomp;
           @t=split(/\t/,$_);
           $ING{$t[9]}=$t[1];
           #print %ING;
     }
close IN;

open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "I'm not dead ye
+t";
open (IN, "c:/work/Cytokine_By_Company/CytokineArrays.txt")||die "I'm 
+Dead!!!!";

     while(<IN>){
     chomp;
     @cytokine=split(/\t/,$_);
          
     while (/\d+/ and exists $ING{$cytokine}){
            $count++;}
             
     foreach $cytokine (sort{$ING{$b}<=>$ING{$a}} keys $ING){
            print OUT "$ING{cytokine}\t$count\n";}
           
      }       
close IN;
close OUT;
[download]

Thanks to everyone for their help. Between all of your comments and some thinking, I finally got the code to generate what I needed it to generate. I also understand your comments about declaring my variables, however, the person I'm working under gets very frustrated with me when I do this. Please don't ask me why. Therefore, I leave them out. Anyway, my new code looks like this:

#!/usr/bin/perl -w

#open(OUT, ">c:/work/new_list.txt");
open (IN, "c:/work/GeneID_Count/CytokineList.txt")||die "Could not ope
+n Cytokine Arrays.txt";

%seen = ();
while(<IN>){ 
     chomp;
     @cytokine=split(/\t/,$_);
     $seen{$cytokine[0]}++;
    
}
close IN;

open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "Cound not creat
+e Cytokine.txt";
open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt")|| 
+die "Could not open ING_cytokines_20080805.txt";

while (<IN>){
      chomp;
      @ING=split(/\t/,$_);
      if ($ING[9]=~/\d/ and exists $seen{$ING[9]}){
           print OUT "$ING[1]\t$ING[9]\t$seen{$ING[9]}\n";
      }
}

close IN;
close OUT;
[download]

Thank you all again for all of your help.

In reply to Match Count by de2425

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.