Despite your requirement that this be "on the command line," you might solve this yourself by understanding and then extending the example offered or an answer which can be found in one of the many other SOPW's asking about essentially the same chore.

And, yes, it's more likely the latter, since you want to test the content of a specific column (you didn't say which one) in each line in file against the content of a specific column in any line in a second file...

... or is that not what you meant? The phrase "where the column from file 1 matches somewhere in file 2" makes me wonder if you're looking for any column in a given (same line number) line in file 2 that matches the content of the specified column in a particular line in file 1. Your reply to the first answer would appear to rule that out were it not for the terminal punctuation -- a question mark!

The first step to solving your problem is probably re-stating it to yourself, in a clear, precise and unambiguous manner.

Update: Upon posting this reply, discovered that ZWcarp had made major, un-acknowledged revisions to the OP. meh!
Added: (and his code doesn't compile under strict. At line 15, Global symbol "@file2" requires explicit package name

Re-updated. (Yech): OP's first update (prior to adding the reference to "a gene identifier number or a CG number. These are always numbers and letter delimited somehow.") left the requirement ambiguous (at least to me) so I prepped this, seeking clarification. Clearly, it's not characteristic of the new spec, but, FTR:

File 1 File2 Col 1 Col2 Col3 Col4 Col 1 Col2 Col3 Col4 1 2 3 4 4 3 2 1 4 3 2 1 a b c d 10 11 12 13 12 11 13 10 a1 b c d a4 b4 c d4 Line 1: no matches Line 2: # F1, L2 matches F2, L1 Line 3: # F1, L3,Col2 matches F2, L3, Col2 Line 4: # F1, L4,Cols 2, 3 & 4 match F2, L2, Cols 2, 3 & 4 # and also matches contents of F2, L4, Col3 # Do both satisfy your criteria?

Where "F1" (in the data sample) means File1, "L2" means Line 2 and "Col" and "Cols" are -- I hope -- self explanatory.


In reply to Re^3: Command Line Hash to print things in common between two files by ww
in thread Command Line Hash to print things in common between two files by ZWcarp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.