Re: Issues with Column headings

I found the code a bit confusing and this tab delimited format is a bummer to work with. One of the problems will be that when you add "null" fields, if you don't quote these things, then it is really hard to see these fields in file3.

Anyway I would suggest using some version of Text::CSV to parse the data and you can adjust the quoting characters to your liking (this module helps on both input and output). Tabs will probably get mangled in example data below, but I think you'll get the idea.

#!/usr/bin/perl -w
use strict;

use Text::CSV_XS;

open (my $FILE1, '<', "file1.csv") or die "cannot open file1 $!\n";
open (my $FILE2, '<', "file2.csv") or die "cannot open file3 $!\n";
open (my $FILE3, '>', "file3.csv") or die "cannot open file3 $!\n";

my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ , 
                               sep_char => "\t", always_quote =>1});

print $FILE3 "Match\t".<$FILE1>; # header for file3
<$FILE2>;                        # skip header line of file2

my %file1; 
my %file2;

while (my $row = $csv->getline($FILE1)) 
{
    my @fields = @$row;
    my $id = $fields[4];
    $file1{$id}=["",@fields];
}

while (my $row = $csv->getline($FILE2)) 
{
    my @fields = @$row;
    my $id = $fields[0];
    $file2{$id}=["","","","","",@fields,"","","","",""];
}

foreach my $id1 (keys %file1)
{
    if (exists $file2{$id1})
    {
        $file1{$id1}[0] ="both";             #both files
        $csv->print ($FILE3, $file1{$id1});
    }
    else
    {
        $file1{$id1}[0] ="1";                #file1 only
        $csv->print ($FILE3, $file1{$id1});
    }
}

foreach my $id2 (keys %file2)
{
    if (!exists $file1{$id2})
    {
        $file2{$id2}[0] ="2";               #file2 only
        $csv->print ($FILE3, $file2{$id2});
    }
}
[download]

file1:

Group    Functional_Category    LT    IA    ID    Symbol    Descriptio
+n    TID    GID    LO    S1    S2    HL    Status    V1    V2    IN 
q    w    a    a    1    AA    some description    10    11    1    s1
+a    s2a    1    1    1    1    1
q    w    b    b    2    BB    another descrp    11    12    1    s1b 
+   s2b    1    1    1    1    1
q    w    c    c    5    CC    despript A    12    13    1    s1c    s
+2c    1    1    1    1    1
[download]

file2

ID    Symbol    Description    TID    GID    LO    S1    S2 
2    BB    another descrp    11    12    1    s1b    s2b
4    DD    some more stuff    14    14    4    s1d    s2d
[download]

file3

Match    Group    Functional_Category    LT    IA    ID    Symbol    D
+escription    TID    GID    LO    S1    S2    HL    Status    V1    V
+2    IN 
"1"    "q"    "w"    "a"    "a"    "1"    "AA"    "some description"  
+  "10"    "11"    "1"    "s1a"    "s2a"    "1"    "1"    "1"    "1"  
+  "1"
"both"    "q"    "w"    "b"    "b"    "2"    "BB"    "another descrp" 
+   "11"    "12"    "1"    "s1b"    "s2b"    "1"    "1"    "1"    "1" 
+   "1"
"1"    "q"    "w"    "c"    "c"    "5"    "CC"    "despript A"    "12"
+    "13"    "1"    "s1c"    "s2c"    "1"    "1"    "1"    "1"    "1"
"2"    ""    ""    ""    ""    "4"    "DD"    "some more stuff"    "14
+"    "14"    "4"    "s1d"    "s2d"    ""    ""    ""    ""    ""
[download]

Comment on Re: Issues with Column headings Select or Download Code

Replies are listed 'Best First'.
Re^2: Issues with Column headings by bluray (Sexton) on Sep 20, 2011 at 22:15 UTC
Hi Marshall, Thanks for the reply and the code. I tried the code. I am getting the output file with only headers.	[reply]
Re^3: Issues with Column headings by Marshall (Canon) on Sep 20, 2011 at 23:21 UTC
You had a tab delimited file eg( split (/\t/,$line); ). Even though the separator is tab instead of comma, this is also called a CSV file. I also used a tab delimited file. The problem is that when I cut and past the file from my text editor into the `<code></code>` section here, the tabs get converted to spaces! Ugh! So, download my data file, and use Notepad or whatever text editor you like to convert the sequences of spaces between the columns back into single tab characters(no spaces before or after the tabs!). The file will look similar on the screen, but there will be a BIG difference in how it is parsed! I hate tab delimited files! Tab is a non-printing character and so it is hard to manually modify the files. Many of the data files that I work with use "\|" for the delimiter and that usually works out well. Some databases will disallow "\|" to be in the data fields and so parsing is easy. In my code `sep_char => "\t"` sets the separation character to be tab. The default is "," (comma).	[reply] [d/l] [select]
Re^4: Issues with Column headings by bluray (Sexton) on Sep 22, 2011 at 15:37 UTC
Hi Marshall, Thanks for the reply. I was finally able to get the correct output format. The error was in the way I saved the file. I used the field delimiter as ','. Now, I changed it to 'tab' and the code worked fine.	[reply]