Re^3: Split a large text file by columns

My apologies about the previous submit. Hit the create button mistakenly. It is formatted better now.

Thank you Anonymous Monk. The column numbers vary by file. I have below a perl script where I first split the file using awk in bash then try to merge each file to a file with the row names. There is also another file that I used that split the file on tab across each row then place the first element of array in a scalar variable. Then I tried adding the first element and the remaining elements to a hash

The first code

 #!/usr/bin/perl -w
use strict;        
use warnings;
use diagnostics;
use Getopt::Std;

#reading options
our ($opt_i);
getopts('i:');
if (!$opt_i) {
    print STDERR "\nInput file name (-i) required\n\n\n";
}
#open the file or die
open INFILE, "<", $opt_i or die "No such input file $opt_i";

while (<INFILE>) {
chomp;
my @fh = <INFILE>;

my @fh = split ('\t', $_);
#print "@fh";
    my $geno = shift @fh;
    
    
#open directory. the "." represents current directory
opendir(DIR, ".");

#place all files in an array
#@files = readdir(DIR);
    my @files = glob("*.txt");

#close directory
closedir(DIR);
my @merge;

#process each file
    foreach my $file (@files) {
    open FILE, "<", $file;
    
    while(my @line = <FILE>) {
        foreach my $comb (@line) {

        print "$geno\t"."$comb";
    close FILE;
     
    }
        
}
}
}
close INFILE;
[download]

The second code, which is splitting the file

 #!/usr/bin/perl -w

use strict;            
use warnings;
use diagnostics;
use Getopt::Std;

#reading options
our ($opt_i);
getopts('i:');
if (!$opt_i) {
    print STDERR "\nInput file name (-i) required\n\n\n";
}
#open the file or die
open INFILE, "<", $opt_i or die "No such input file $opt_i";

my %file;
while (<INFILE>) {
#remove the newline character
chomp;

#create an array
my @fh = <INFILE>;

#split the file by tabs in each row
@fh = split ('\t', $_);
#place the first column (row names) in $geno
    my $geno = shift @fh;

  my $remain = join "_", @fh[1-];
    push @{$geno{$remain}}, $_;
}
# print the first field (the username)
 print "%file\n";
}
close INFILE;
[download]

Comment on Re^3: Split a large text file by columns Select or Download Code

Replies are listed 'Best First'.
Re^4: Split a large text file by columns by Anonymous Monk on Apr 22, 2017 at 05:54 UTC
I see what you're trying to do, almost How do you decide what filename gets what columns? Based on your sample data, these are two rows (headers then rows) that go into three files , this is how you're wanting to split that up `<GSOR> vnir_1 vnir_2 vnir_3 <GSOR> vnir_4 vnir_5 vnir_6 <GSOR> vnir_7 vnir_8 vnir_9` [download] This is the first data row split up and ready to end up in three different files `310015 0.37042 0.36909 0.36886 310015 0.36698 0.36615 0.36449 310015 0.36404` [download] What is the filename for the first 3 columns? Second 3 columns? Last "3" columns? And this is repeated for every row in the original data?	[reply] [d/l] [select]
Re^5: Split a large text file by columns by tc (Novice) on May 01, 2017 at 21:44 UTC
Hi, I have been out of town for awhile and did not check the forum. That is the way that I would like the to be split. Ideally, the file names would be in a sequential order to make it simple. As long as each column has a header, the file name does not matter. Yes, it will be repeated for every row in the original data. The row numbers change from file to file as well as the column numbers. The solutions offered by Marshall and Kevbot did the trick. Thank you for your assistance.	[reply]