I have a really large tab delimited file (about 6 GB) having 4 columns (pos, start, end, id) and another file containing only ids in one column which is about 200 MB. I want to output a new file with matching ids and the corresponding positions. It also should output a second file wich should contain the line number of the first and second input file, the number of matching and non matching positions by ids. Thanks a lot in advance. Appreciate your help very much.
first file Format pos start end id chr1 11223 11224 rs2342349 chr2 23423 23424 rs6345435 chr3 64564 64565 rs3432456 chr4 56456 56457 rs7979979 second file Format id rs2342349 #only match rs3274234 rs2342344 Output1 chr1 11223 11224 rs2342349 Output2 Number_pos_1st_file 4 Number_pos_2nd_file 3 Nr_Matching 1 Nr_Non_matching 2

In reply to Query large tab delimited file by a list by Elninh05

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.