to continue on: (untested) open( my $FH1 , '<', $file1 ) or die "Couldn't open file \"$file1\": $ +!"; open( my $FH2 , '<', $file2 ) or die "Couldn't open file \"$file2\": $ +!"; my %ids; while <my $id = <$FH2>) { chomp $id; #remove line ending $ids{$id} = 1; } my $line = <$FH1>; #throw away first header line while ($line = <$FH1>) { #get 'rs2342349' from: "chr1 11223 11224 rs2342349\n" my ($id) = (split /\s+/,$line)[3]; #whitespace chars also includes + tabs print $line if exists $ids{$id}; }
Update: You will notice that I removed the "\n" from the "die" statement. die will put an \n in by default. If you explicitly put in an \n that changes what the "die" prints! Whoa! Here is a short demo:

#open IN, '<', 'somename' or die "xxx $!\n"; #prints xxx No such file or directory open IN, '<', 'somename' or die "xxx $!"; #prints xxx No such file or directory at C:\Projects_Perl\testing\junk +.pl line 4.
Update 2: RE: the statistics

If "Number_pos_1st_file" is just the line count, then that is easy. If these pos values are not unique, then I see problems because the file is so large that it is likely that a hash to count them won't fit into memory. In that case, I would do a system sort on the file and then read through it to find the unique pos values.

"Number_pos_2nd_file" is just keys %ids? Or perhaps it is the line count?


In reply to Re^5: Query large tab delimited file by a list by Marshall
in thread Query large tab delimited file by a list by Elninh05

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.