coolda has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I'm a newbie in a programming world, i've been trying to work on this code for last two weeks with no avail. I have thousands of tab delimited files with same format for example
file 1 col0 col1 col2 col3 col4 col5 ... ... samp1 samp2 samp3 samp4 .... files follow similar format and what i need to do is extract 1st and 4 +th column from the first file and output on a new file and then take +out only 4th column from the rest of the files. So before i work on my actual project, i wanted to try with more simpl +e table. The table i'm working on right now is S1.txt col1 col2 col3 1 4 7 2 5 8 3 6 9 S2.txt col1 col2 col3 1 44 77 2 55 88 3 66 99 The result i'm getting col1 col3 col3 1 4 2 5 3 6 The result i want col1 col3 col3 1 4 77 2 5 88 3 6 99
#!/usr/bin/perl -w use warnings; use strict; my @desired_cols = qw(colname1 colname3); my @desired_cols1= qw(colname3); my $temp = 'tmp.txt'; # reads first line to get actual column names open(S1, 'S1.txt') || die "Can't open S1: $!"; open(S2, 'S2.txt') || die "Can't open S2 : $!"; open(OUT, ">$temp") || die "Can't create output : $!"; my $header_line = (<S1>); my $header_line1 = (<S2>); my @actual_cols = split(/\s+/,$header_line); # get column number of the actual column names my $pos =0; my %col2_num = map {$_ => $pos++}@actual_cols; # translate the desired col names into position numbers my @slice = map{$col2_num{$_}}@desired_cols; my @slice1 = map{$col2_num{$_}}@desired_cols1; print OUT join("\t","@desired_cols"),"\r\n"; #header line # print colname1 colname3 colname3 in outfile while (<S1>, <S2>) { my @row = (split)[@slice]; my @row1 = (split)[@slice1]; print OUT join("\t","@row @row1"),"\r\n"; #each data row }
I think the problem with my code is with the while loop, i thought it would read S1 and S2 line by line but it reads S1 only i think.. please help me out, i'm under a lot of stress :((

Replies are listed 'Best First'.
Re: Adding a column to a file where i took out two columns
by GrandFather (Saint) on Sep 24, 2014 at 02:23 UTC

    Your problem description, example and code don't seem to be consistent so the following is a guess at what you may be after:

    #!/usr/bin/perl use warnings; use strict; my $str1 = <<STR; col1 col2 col3 1 4 7 2 5 8 3 6 9 STR my $str2 = <<STR; col1 col2 col3 1 44 77 2 55 88 3 66 99 STR for my $spec ([$str1, qw(col1 col3)], [$str2, qw(col3)]) { my ($file, @wantedCols) = @$spec; print "@wantedCols\n"; open my $fIn, '<', \$file; my $index = 0; my %fileCols = map{$_ => $index++} split /\s+/, <$fIn>; my @slice = map{exists $fileCols{$_} ? $fileCols{$_} : ()} @wanted +Cols; while (<$fIn>) { chomp; print join (' ', (split /\s+/)[@slice]), "\n"; } }

    Prints:

    col1 col3 1 7 2 8 3 9 col3 77 88 99

    Note the "string as file" trick used to make the sample self contained and printing to stdout (for the same reason) will need to be changed to suit your real application of course. But using a self contained script like this as a test bed can speed up development and testing a lot because you don't need to juggle multiple files during testing.

    Note too the three parameter open and the use of lexical file handles. Both things you should get into the habit of using.

    Perl is the programming world's equivalent of English
      thanks for the reply, however the out put i want is :
      col1 col3 col3 1 7 77 2 8 88 3 9 99
      I want the col3 side by side

        Ok, that changes things somewhat. How about this:

        #!/usr/bin/perl use warnings; use strict; my $str1 = <<STR; col1 col2 col3 1 4 7 2 5 8 3 6 9 STR my $str2 = <<STR; col1 col2 col3 1 44 77 2 55 88 3 66 99 STR open my $fIn, '<', \$str1; my $index = 0; my %fileCols = map{+"file1 $_" => $index++} split /\s+/, <$fIn>; my @file1Data; push @file1Data, [split /\s+/] while <$fIn>; close $fIn; my @wantedCols = ('col1', 'col3', 'file1 col3'); open $fIn, '<', \$str2; $index = @{$file1Data[0]}; $fileCols{$_} = $index++ for split /\s+/, <$fIn>; my @slice = map{exists $fileCols{$_} ? $fileCols{$_} : ()} @wantedCols +; print join(' ', map{(split /\s+/)[-1]} @wantedCols), "\n"; while (<$fIn>) { chomp; print join (' ', (@{$file1Data[$. - 2]}, split /\s+/)[@slice]), "\ +n"; }

        Prints:

        col1 col3 col3 1 77 7 2 88 8 3 99 9

        Note that effectively all of the first file is read into memory to avoid having to re-read and parse it for each following file. This is fine so long as the first file is less than about half the memory you have available.

        Perl is the programming world's equivalent of English
Re: Adding a column to a file where i took out two columns
by Tux (Canon) on Sep 25, 2014 at 06:19 UTC

    If you are open to "other" viewpoints, I'd say that using DBD::CSV (with "\t" as csv_sep_char) will make your task very easy using SQL commands.


    Enjoy, Have FUN! H.Merijn