Matching column in different files to create a third one

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to match column 3 in file1 to column 1 in file 2 and create file3 with column1 from file1 column2 from file1 and column2 from file 2

file 1 sample

SNDK    80004C101       AT
XLNX    983919101       BB
NETL    64118B100       BS
AMD     007903107       CC
KLAC    482480100       DC
TER     880770102       KATS
ATHR    04743P108       KATS
RBCN    78112T107       JT
TXN     882508104       KATS
STM     861012102       KATS
[download]

file 2 sample

AT      AU
AU      AU
AV      AT
BB      BE
BS      BR
BSE     HU
BZ      BR
CC      CL
CD      CZ
CG      CN
[download]

file1 column3 will need to match one value in file2 column1. WHen a match is found, I need to overwite file1 column3 with the corresponding match in file2 column2.

so for the above samples, the output for line1 and line2 of file3 will be:

SNDK    80004C101       AU
XLNX    983919101       BE
[download]

Any solutions using perl?? Thanks for your help!

Comment on Matching column in different files to create a third one Select or Download Code

Replies are listed 'Best First'.
Re: Matching column in different files to create a third one by jethro (Monsignor) on Jan 19, 2011 at 16:15 UTC
In addition to what toolic said you should realize this isn't a code-writing-for-hire website. We want to help you writing perl, not do all the work for you. UPDATE: As usual the "we" is really "some of us" ;-) To get you started, you might check out perldata for information about hashes (good for matching data from one file in another). Just read in the first file, put the column into the hash and check the hash while reading the second file.	[reply]
Re: Matching column in different files to create a third one by kennethk (Abbot) on Jan 19, 2011 at 16:06 UTC
With well formatted input, this sort of task is fairly simple. However, as toolic points out, your lack of formatting makes this unnecessarily difficult. If I assume your files are: file1.txt: `SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS RBCN 78112T107 JT TXN 882508104 KATS STM 861012102 KATS` [download] file2.txt: `AT AU AU AU AV AT BB BE BS BR BSE HU BZ BR CC CL CD CZ CG CN` [download] and I assume the mappings of 1st to 2nd column in file 2 are many to one(so I can use a hash), then you'll likely want something like this: `#!/usr/bin/perl use strict; use warnings; my %mapping; open my $key_handle, '<', 'file2.txt' or die "Open fail: $!"; while (<$key_handle>) { my ($key, $value) = split; $mapping{$key} = $value; } open my $data_handle, '<', 'file1.txt' or die "Open fail: $!"; open my $out_handle, '>', 'file3.txt' or die "Open fail: $!"; while (<$data_handle>) { my @columns = split; print $out_handle "$columns[0]\t$columns[1]\t$mapping{$columns[2]} +\n" }` [download] However, my guess at your file 1 contains the string 'KATS' multiple times in what I guess was column 3 and never appears in your file 2. Given the distribution of 'KATS' in your specified file 1, this mapping is impossible unless file 1 contains only one line.	[reply] [d/l] [select]
Re: Matching column in different files to create a third one by toolic (Bishop) on Jan 19, 2011 at 15:51 UTC
You didn't hit the Preview button before you posted. If you had, you would have seen that your post is unformatted, and there is no way to tell which is your column 1 data from your column 2 data, etc. Read Writeup Formatting Tips, then ~~repost~~ reply to your node, placing your data inside "code" tags. Update: planetscape pointed out that I misspoke.	[reply]
Re^2: Matching column in different files to create a third one by planetscape (Chancellor) on Jan 19, 2011 at 16:06 UTC
Read Writeup Formatting Tips, then repost, placing your data inside "code" tags. Unfortunately, there is no way for Anonymous Monk to edit his/her own posts. Reposting will just create a duplicate node, and one will need to be reaped. At this point, the "best" solution is to consider the node for `<code>` tags, as kennethk has done. HTH, planetscape	[reply] [d/l]
Re: Matching column in different files to create a third one by JavaFan (Canon) on Jan 19, 2011 at 16:29 UTC
perl -lape'BEGIN{%$=map/\S+/g,`cat file2`}$F[2]=${$}{$F[2]}\|\|$F[2];$_= +"@F"' file1 [download] HTH. HAND.	[reply] [d/l]