I am desperately trying to figure out how to accomplish a match count between two files. I'm sure that this is something that is very simple but as a novice I am failing miserably at it.

What I have is two files with data within them and I need to count matches of specific values between them. The first file one might consider a master list of sorts. What I'm wanting is to take the numeric list from that file and count how many times each number occurs within the second file. I would like to then print out the Alpha numeric data from the master list that is associated with the numeric data along with a count of how many times that item occurs in the second file.

The data from the first file looks approximately like:

Name, description, ID#

The data from the second file has the same type of data but the lists is shorter and there are several lists. It looks like:

Name, description, ID#, Name, description, ID#,Name, description, ID#

The Output I'm looking for is:

Name, ID#, #of times matched

I have tried several different things but have not had success with it at all. My sample code is below. If anyone could offer any constructive help, I would very much appreciate it

#!/usr/bin/perl -w open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt"); while (<IN>){ chomp; @t=split(/\t/,$_); $ING{$t[9]}=$t[1]; #print %ING; } close IN; open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "I'm not dead ye +t"; open (IN, "c:/work/Cytokine_By_Company/CytokineArrays.txt")||die "I'm +Dead!!!!"; while(<IN>){ chomp; @cytokine=split(/\t/,$_); while (/\d+/ and exists $ING{$cytokine}){ $count++;} foreach $cytokine (sort{$ING{$b}<=>$ING{$a}} keys $ING){ print OUT "$ING{cytokine}\t$count\n";} } close IN; close OUT;

Thanks to everyone for their help. Between all of your comments and some thinking, I finally got the code to generate what I needed it to generate. I also understand your comments about declaring my variables, however, the person I'm working under gets very frustrated with me when I do this. Please don't ask me why. Therefore, I leave them out. Anyway, my new code looks like this:

#!/usr/bin/perl -w #open(OUT, ">c:/work/new_list.txt"); open (IN, "c:/work/GeneID_Count/CytokineList.txt")||die "Could not ope +n Cytokine Arrays.txt"; %seen = (); while(<IN>){ chomp; @cytokine=split(/\t/,$_); $seen{$cytokine[0]}++; } close IN; open (OUT, ">c:/work/GeneID_Count/Cytokine.txt")||die "Cound not creat +e Cytokine.txt"; open (IN, "c:/work/Cytokine_By_Company/ING_cytokines_20080805.txt")|| +die "Could not open ING_cytokines_20080805.txt"; while (<IN>){ chomp; @ING=split(/\t/,$_); if ($ING[9]=~/\d/ and exists $seen{$ING[9]}){ print OUT "$ING[1]\t$ING[9]\t$seen{$ING[9]}\n"; } } close IN; close OUT;

Thank you all again for all of your help.


In reply to Match Count by de2425

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.