Merging data

lkenefic has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Merging data by Limbic~Region (Chancellor) on May 29, 2012 at 19:50 UTC
lkenefic, Let's break the problem down in pieces. For ease of explanation, I am assuming CSV files rather than .xls/.xlsx. `#!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $file = $ARGV[0] or die "Usage: $0 <input_file>"; my $csv = Text::CSV_XS->new({binary => 1}) or die "Unable to instantia +te CSV object: ", Text::CSV_XS->error_diag(); open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!\n"; while(my $row = $csv->getline($fh)) { # Do something with $row }` [download] That code doesn't do anything useful by itself, but it lays the foundation for processing a CSV row by row. Now consider the following incomplete snippet. `my %lookup; # The hash that maps "in-house" to "different" and vice v +ersa while (my $in_house_row = $csv->getline($fh)) { my $sample_name = $in_house_row->[0]; next if ! defined $lookup{$sample_name}; }` [download] The above snippet shows how to skip rows that do not have a common value between the two files. I do not expect this to solve your problem. I have completely left out the "merge" portion because I do not know what you mean. This should help you articulate exactly what problem you are having as well as demonstrate it with code and real data. You might even want to hand construct the simplest of examples with the expected output. Cheers - L~R	[reply] [d/l] [select]
Re: Merging data by snape (Pilgrim) on May 29, 2012 at 19:52 UTC
Hi I could tell you the steps how to proceed. Step 1. Use while loop to read file 1. and store your sample names and its corresponding features through hash. If your sample names are unique then use it as a key and corresponding features as its values. Something like `my %hash1; open my $IN, "file1.txt" or die $!; my $header = <$IN>; ## use if the file has a header otherwise leave it while(<$IN>){ chomp $_; my @line = split(',',$_); ## split the line to get the sample name a +nd its values $hash1{$line[0]} = 1; ## if your first column is a sample name or ch +ange it accordingly } close($IN);` [download] Step 2. Now open the file 2 and check for the sample names those are common to both the files using exists `my %CommonSampleNames open my $IN1, "file2.txt" or die $!; my $head = <$IN1> ;## again use it if it has a header while(<$IN1>){ chomp $_; my @line = split(',', $_); if(exists $hash1{$line[0]} ){ ## $line[0] is the sample name column $CommonSampleNames{$line[0]} = 1; } } close($IN1);` [download] 3. Open the third file or reference file and again using exists extract the corresponding names. `my %supHash open my $ref, "ref.txt" or die $!; $head = <$ref>; ## if header is present while(<$ref>){ chomp $_; my @line = split(',', $_); ## if it is a csv file if(exists $CommonSampleNames{$line[0]} ){ $supHash{$ine[0]} = $line[1]; ## if second column has reference names } } close($ref);` [download]	[reply] [d/l] [select]