lkenefic has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks... I've been to several websites and tried a few scripts on my own but I'm a total newb at this, so please forgive... I'm trying (re: struggling) to write a perl script to merge two excel files. In file 1, I have data with our "in house sample names" and in file 2 I have data with different sample names but they're the same samples. I have a third reference file that tells me which are the corresponding names. Is there an easy way to merge the two spreadsheets... keeping only the sample file names common to both and keeping all of the data in tact from both sheets? ANY direction (or website) you could provide me would be greatly appreciated... THANKS

Replies are listed 'Best First'.
Re: Merging data
by Limbic~Region (Chancellor) on May 29, 2012 at 19:50 UTC
    lkenefic,
    Let's break the problem down in pieces. For ease of explanation, I am assuming CSV files rather than .xls/.xlsx.
    #!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; my $file = $ARGV[0] or die "Usage: $0 <input_file>"; my $csv = Text::CSV_XS->new({binary => 1}) or die "Unable to instantia +te CSV object: ", Text::CSV_XS->error_diag(); open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!\n"; while(my $row = $csv->getline($fh)) { # Do something with $row }

    That code doesn't do anything useful by itself, but it lays the foundation for processing a CSV row by row. Now consider the following incomplete snippet.

    my %lookup; # The hash that maps "in-house" to "different" and vice v +ersa while (my $in_house_row = $csv->getline($fh)) { my $sample_name = $in_house_row->[0]; next if ! defined $lookup{$sample_name}; }

    The above snippet shows how to skip rows that do not have a common value between the two files. I do not expect this to solve your problem. I have completely left out the "merge" portion because I do not know what you mean. This should help you articulate exactly what problem you are having as well as demonstrate it with code and real data. You might even want to hand construct the simplest of examples with the expected output.

    Cheers - L~R

Re: Merging data
by snape (Pilgrim) on May 29, 2012 at 19:52 UTC

    Hi

    I could tell you the steps how to proceed.

    Step 1. Use while loop to read file 1. and store your sample names and its corresponding features through hash. If your sample names are unique then use it as a key and corresponding features as its values. Something like

    my %hash1; open my $IN, "file1.txt" or die $!; my $header = <$IN>; ## use if the file has a header otherwise leave it while(<$IN>){ chomp $_; my @line = split(',',$_); ## split the line to get the sample name a +nd its values $hash1{$line[0]} = 1; ## if your first column is a sample name or ch +ange it accordingly } close($IN);

    Step 2. Now open the file 2 and check for the sample names those are common to both the files using exists

    my %CommonSampleNames open my $IN1, "file2.txt" or die $!; my $head = <$IN1> ;## again use it if it has a header while(<$IN1>){ chomp $_; my @line = split(',', $_); if(exists $hash1{$line[0]} ){ ## $line[0] is the sample name column $CommonSampleNames{$line[0]} = 1; } } close($IN1);

    3. Open the third file or reference file and again using exists extract the corresponding names.

    my %supHash open my $ref, "ref.txt" or die $!; $head = <$ref>; ## if header is present while(<$ref>){ chomp $_; my @line = split(',', $_); ## if it is a csv file if(exists $CommonSampleNames{$line[0]} ){ $supHash{$ine[0]} = $line[1]; ## if second column has reference names } } close($ref);