comment on

I tried to match .csv two files and then print the output in a third file. The output file should have all the entries of the first and second file. The two files were matched with respect to a column (ID -numerical). Also, in the output file, one more column is created (column[0]) and that will show whether the entry is present only in first file, second file or both. The first file have columns: <\p>

Group, Functional_Category, LT, IA, ID, Symbol, Description, TID, GID, LO, S1, S2, HL, Status, V1, V2, IN

The second file have less number of columns: ID, Symbol, Description, TID, GID, LO, S1, S2

In the output file, I am trying to get a column 'Match' as my first column. But, when I run the script, I am getting formatting issues. For example, the third row has again the column headers of second file repeated. Another problem I encountered is that I wanted "both" if there is a match between the two files for a particular entry. Instead of that, I am getting the numerical ID on the first column. It is not the case if there is not a match and the column entries were correct for those.


#!/usr/bin/perl
use strict;
use warnings;
my %hash;
    
my $infile="File1.csv";
open (my $in_fh, "<", $infile) or die ($!);
my $line=<$in_fh>;
chomp $line;
my @columnheadings= split (/\t/,$line);
 
while (my $line = <$in_fh>){
    chomp $line;
    $line=~ s/\t/,/g;
    $line=~tr/[a-z]/[A-Z]/;
    my @columns=split (/,/,$line);
    my $Uniqueid=$columns[4];
    my $newline;
    unshift(@columns, 'File1');
    $newline=join(",",@columns);
    if (exists($hash{$Uniqueid})){
    print "$hash{$Uniqueid}\n";
    }
    $hash{$Uniqueid}=$newline;
}

my $discard=<$in_fh>;

$infile="File2.csv";
open ($in_fh, "<", $infile) or die ($!);
while (my $line= <$in_fh>){
    chomp $line;
    $line=~s/\t/,/g;
    $line=~tr/[a-z]/[A-Z]/;
    my @columns= split (/,/,$line);
    my $Uniqueid=$columns[0];
       my $newline;
       if (exists($hash{$Uniqueid})){
    $line=~ s/^File1,/both,/;
    $newline=$line;
    
      }
    else {
    $newline="File2,,,,".$line;
    }

    $hash{$Uniqueid}=$newline;
   
}


$discard=<$in_fh>;


my $outfile="OutputFile.csv";
open (my $out_fh, ">", $outfile) or die ($!);
unshift (@columnheadings, ('Match'));
my $headings=join (",", @columnheadings);
print $out_fh "$headings\n";
use Data::Dumper;
print Dumper \@columnheadings; 

   
foreach my $id (sort keys (%hash)) {
          my @columns=split(/,/,$hash{$id});
        my $printline=$hash{$id};
        while ($printline=~s/,,/,NA,/g) {
        };
         
    print $out_fh "$printline\n";

    delete $hash{$id};
    
    }
[download]

Hi, Thanks for the reply. I am getting the same output files.

$VAR1 = 'Match', '"Group","Functional_Category","LT","IA","ID","Symbol","Description","TID","GID","LO","S1","S2","HL","Status","V1","V2","IN"' ;

The first columnheading was in single quotes.

In reply to Issues with Column headings by bluray

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.