bluray has asked for the wisdom of the Perl Monks concerning the following question:
I tried to match .csv two files and then print the output in a third file. The output file should have all the entries of the first and second file. The two files were matched with respect to a column (ID -numerical). Also, in the output file, one more column is created (column[0]) and that will show whether the entry is present only in first file, second file or both. The first file have columns: <\p>
Group, Functional_Category, LT, IA, ID, Symbol, Description, TID, GID, LO, S1, S2, HL, Status, V1, V2, IN
The second file have less number of columns: ID, Symbol, Description, TID, GID, LO, S1, S2
In the output file, I am trying to get a column 'Match' as my first column. But, when I run the script, I am getting formatting issues. For example, the third row has again the column headers of second file repeated. Another problem I encountered is that I wanted "both" if there is a match between the two files for a particular entry. Instead of that, I am getting the numerical ID on the first column. It is not the case if there is not a match and the column entries were correct for those.
#!/usr/bin/perl use strict; use warnings; my %hash; my $infile="File1.csv"; open (my $in_fh, "<", $infile) or die ($!); my $line=<$in_fh>; chomp $line; my @columnheadings= split (/\t/,$line); while (my $line = <$in_fh>){ chomp $line; $line=~ s/\t/,/g; $line=~tr/[a-z]/[A-Z]/; my @columns=split (/,/,$line); my $Uniqueid=$columns[4]; my $newline; unshift(@columns, 'File1'); $newline=join(",",@columns); if (exists($hash{$Uniqueid})){ print "$hash{$Uniqueid}\n"; } $hash{$Uniqueid}=$newline; } my $discard=<$in_fh>; $infile="File2.csv"; open ($in_fh, "<", $infile) or die ($!); while (my $line= <$in_fh>){ chomp $line; $line=~s/\t/,/g; $line=~tr/[a-z]/[A-Z]/; my @columns= split (/,/,$line); my $Uniqueid=$columns[0]; my $newline; if (exists($hash{$Uniqueid})){ $line=~ s/^File1,/both,/; $newline=$line; } else { $newline="File2,,,,".$line; } $hash{$Uniqueid}=$newline; } $discard=<$in_fh>; my $outfile="OutputFile.csv"; open (my $out_fh, ">", $outfile) or die ($!); unshift (@columnheadings, ('Match')); my $headings=join (",", @columnheadings); print $out_fh "$headings\n"; use Data::Dumper; print Dumper \@columnheadings; foreach my $id (sort keys (%hash)) { my @columns=split(/,/,$hash{$id}); my $printline=$hash{$id}; while ($printline=~s/,,/,NA,/g) { }; print $out_fh "$printline\n"; delete $hash{$id}; }
Hi, Thanks for the reply. I am getting the same output files.
The first columnheading was in single quotes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Issues with Column headings
by moritz (Cardinal) on Sep 18, 2011 at 17:46 UTC | |
|
Re: Issues with Column headings
by choroba (Cardinal) on Sep 18, 2011 at 21:51 UTC | |
|
Re: Issues with Column headings
by Marshall (Canon) on Sep 19, 2011 at 07:14 UTC | |
by bluray (Sexton) on Sep 20, 2011 at 22:15 UTC | |
by Marshall (Canon) on Sep 20, 2011 at 23:21 UTC | |
by bluray (Sexton) on Sep 22, 2011 at 15:37 UTC |