in reply to Issues with Column headings

I found the code a bit confusing and this tab delimited format is a bummer to work with. One of the problems will be that when you add "null" fields, if you don't quote these things, then it is really hard to see these fields in file3.

Anyway I would suggest using some version of Text::CSV to parse the data and you can adjust the quoting characters to your liking (this module helps on both input and output). Tabs will probably get mangled in example data below, but I think you'll get the idea.

#!/usr/bin/perl -w use strict; use Text::CSV_XS; open (my $FILE1, '<', "file1.csv") or die "cannot open file1 $!\n"; open (my $FILE2, '<', "file2.csv") or die "cannot open file3 $!\n"; open (my $FILE3, '>', "file3.csv") or die "cannot open file3 $!\n"; my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ , sep_char => "\t", always_quote =>1}); print $FILE3 "Match\t".<$FILE1>; # header for file3 <$FILE2>; # skip header line of file2 my %file1; my %file2; while (my $row = $csv->getline($FILE1)) { my @fields = @$row; my $id = $fields[4]; $file1{$id}=["",@fields]; } while (my $row = $csv->getline($FILE2)) { my @fields = @$row; my $id = $fields[0]; $file2{$id}=["","","","","",@fields,"","","","",""]; } foreach my $id1 (keys %file1) { if (exists $file2{$id1}) { $file1{$id1}[0] ="both"; #both files $csv->print ($FILE3, $file1{$id1}); } else { $file1{$id1}[0] ="1"; #file1 only $csv->print ($FILE3, $file1{$id1}); } } foreach my $id2 (keys %file2) { if (!exists $file1{$id2}) { $file2{$id2}[0] ="2"; #file2 only $csv->print ($FILE3, $file2{$id2}); } }
file1:
Group Functional_Category LT IA ID Symbol Descriptio +n TID GID LO S1 S2 HL Status V1 V2 IN q w a a 1 AA some description 10 11 1 s1 +a s2a 1 1 1 1 1 q w b b 2 BB another descrp 11 12 1 s1b + s2b 1 1 1 1 1 q w c c 5 CC despript A 12 13 1 s1c s +2c 1 1 1 1 1
file2
ID Symbol Description TID GID LO S1 S2 2 BB another descrp 11 12 1 s1b s2b 4 DD some more stuff 14 14 4 s1d s2d
file3
Match Group Functional_Category LT IA ID Symbol D +escription TID GID LO S1 S2 HL Status V1 V +2 IN "1" "q" "w" "a" "a" "1" "AA" "some description" + "10" "11" "1" "s1a" "s2a" "1" "1" "1" "1" + "1" "both" "q" "w" "b" "b" "2" "BB" "another descrp" + "11" "12" "1" "s1b" "s2b" "1" "1" "1" "1" + "1" "1" "q" "w" "c" "c" "5" "CC" "despript A" "12" + "13" "1" "s1c" "s2c" "1" "1" "1" "1" "1" "2" "" "" "" "" "4" "DD" "some more stuff" "14 +" "14" "4" "s1d" "s2d" "" "" "" "" ""

Replies are listed 'Best First'.
Re^2: Issues with Column headings
by bluray (Sexton) on Sep 20, 2011 at 22:15 UTC
    Hi Marshall, Thanks for the reply and the code. I tried the code. I am getting the output file with only headers.
      You had a tab delimited file eg( split (/\t/,$line); ). Even though the separator is tab instead of comma, this is also called a CSV file. I also used a tab delimited file. The problem is that when I cut and past the file from my text editor into the <code></code> section here, the tabs get converted to spaces! Ugh!

      So, download my data file, and use Notepad or whatever text editor you like to convert the sequences of spaces between the columns back into single tab characters(no spaces before or after the tabs!). The file will look similar on the screen, but there will be a BIG difference in how it is parsed!

      I hate tab delimited files! Tab is a non-printing character and so it is hard to manually modify the files.

      Many of the data files that I work with use "|" for the delimiter and that usually works out well. Some databases will disallow "|" to be in the data fields and so parsing is easy.

      In my code sep_char => "\t" sets the separation character to be tab. The default is "," (comma).

        Hi Marshall, Thanks for the reply. I was finally able to get the correct output format. The error was in the way I saved the file. I used the field delimiter as ','. Now, I changed it to 'tab' and the code worked fine.