I tried to match .csv two files and then print the output in a third file. The output file should have all the entries of the first and second file. The two files were matched with respect to a column (ID -numerical). Also, in the output file, one more column is created (column[0]) and that will show whether the entry is present only in first file, second file or both. The first file have columns: <\p>

Group, Functional_Category, LT, IA, ID, Symbol, Description, TID, GID, LO, S1, S2, HL, Status, V1, V2, IN

The second file have less number of columns: ID, Symbol, Description, TID, GID, LO, S1, S2

In the output file, I am trying to get a column 'Match' as my first column. But, when I run the script, I am getting formatting issues. For example, the third row has again the column headers of second file repeated. Another problem I encountered is that I wanted "both" if there is a match between the two files for a particular entry. Instead of that, I am getting the numerical ID on the first column. It is not the case if there is not a match and the column entries were correct for those.

#!/usr/bin/perl use strict; use warnings; my %hash; my $infile="File1.csv"; open (my $in_fh, "<", $infile) or die ($!); my $line=<$in_fh>; chomp $line; my @columnheadings= split (/\t/,$line); while (my $line = <$in_fh>){ chomp $line; $line=~ s/\t/,/g; $line=~tr/[a-z]/[A-Z]/; my @columns=split (/,/,$line); my $Uniqueid=$columns[4]; my $newline; unshift(@columns, 'File1'); $newline=join(",",@columns); if (exists($hash{$Uniqueid})){ print "$hash{$Uniqueid}\n"; } $hash{$Uniqueid}=$newline; } my $discard=<$in_fh>; $infile="File2.csv"; open ($in_fh, "<", $infile) or die ($!); while (my $line= <$in_fh>){ chomp $line; $line=~s/\t/,/g; $line=~tr/[a-z]/[A-Z]/; my @columns= split (/,/,$line); my $Uniqueid=$columns[0]; my $newline; if (exists($hash{$Uniqueid})){ $line=~ s/^File1,/both,/; $newline=$line; } else { $newline="File2,,,,".$line; } $hash{$Uniqueid}=$newline; } $discard=<$in_fh>; my $outfile="OutputFile.csv"; open (my $out_fh, ">", $outfile) or die ($!); unshift (@columnheadings, ('Match')); my $headings=join (",", @columnheadings); print $out_fh "$headings\n"; use Data::Dumper; print Dumper \@columnheadings; foreach my $id (sort keys (%hash)) { my @columns=split(/,/,$hash{$id}); my $printline=$hash{$id}; while ($printline=~s/,,/,NA,/g) { }; print $out_fh "$printline\n"; delete $hash{$id}; }

Hi, Thanks for the reply. I am getting the same output files.

$VAR1 = 'Match', '"Group","Functional_Category","LT","IA","ID","Symbol","Description","TID","GID","LO","S1","S2","HL","Status","V1","V2","IN"' ;

The first columnheading was in single quotes.


In reply to Issues with Column headings by bluray

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.