in reply to Split a large text file by columns

Is it possible to do this in perl?

Yes, absolutely

I can split the file using awk in bash but i'm unable to merge the first column to each new file. I know perl has a split function but that splits rows not columns. I have tried a few perl scripts but they are dependent on specific column numbers, so as ideas has dried up, I turn to you, Monks.

Please link the scripts you tried

Any such program is going to be "dependent on specific column numbers" in one way or another

Replies are listed 'Best First'.
Re^3: Split a large text file by columns
by tc (Novice) on Apr 21, 2017 at 20:23 UTC

    My apologies about the previous submit. Hit the create button mistakenly. It is formatted better now.

    Thank you Anonymous Monk. The column numbers vary by file. I have below a perl script where I first split the file using awk in bash then try to merge each file to a file with the row names. There is also another file that I used that split the file on tab across each row then place the first element of array in a scalar variable. Then I tried adding the first element and the remaining elements to a hash

    The first code

    #!/usr/bin/perl -w use strict; use warnings; use diagnostics; use Getopt::Std; #reading options our ($opt_i); getopts('i:'); if (!$opt_i) { print STDERR "\nInput file name (-i) required\n\n\n"; } #open the file or die open INFILE, "<", $opt_i or die "No such input file $opt_i"; while (<INFILE>) { chomp; my @fh = <INFILE>; my @fh = split ('\t', $_); #print "@fh"; my $geno = shift @fh; #open directory. the "." represents current directory opendir(DIR, "."); #place all files in an array #@files = readdir(DIR); my @files = glob("*.txt"); #close directory closedir(DIR); my @merge; #process each file foreach my $file (@files) { open FILE, "<", $file; while(my @line = <FILE>) { foreach my $comb (@line) { print "$geno\t"."$comb"; close FILE; } } } } close INFILE;

    The second code, which is splitting the file

    #!/usr/bin/perl -w use strict; use warnings; use diagnostics; use Getopt::Std; #reading options our ($opt_i); getopts('i:'); if (!$opt_i) { print STDERR "\nInput file name (-i) required\n\n\n"; } #open the file or die open INFILE, "<", $opt_i or die "No such input file $opt_i"; my %file; while (<INFILE>) { #remove the newline character chomp; #create an array my @fh = <INFILE>; #split the file by tabs in each row @fh = split ('\t', $_); #place the first column (row names) in $geno my $geno = shift @fh; my $remain = join "_", @fh[1-]; push @{$geno{$remain}}, $_; } # print the first field (the username) print "%file\n"; } close INFILE;

      I see what you're trying to do, almost

      How do you decide what filename gets what columns?

      Based on your sample data, these are two rows (headers then rows) that go into three files , this is how you're wanting to split that up

      <GSOR> vnir_1 vnir_2 vnir_3 <GSOR> vnir_4 vnir_5 vnir_6 <GSOR> vnir_7 vnir_8 vnir_9

      This is the first data row split up and ready to end up in three different files

      310015 0.37042 0.36909 0.36886 310015 0.36698 0.36615 0.36449 310015 0.36404

      What is the filename for the first 3 columns? Second 3 columns? Last "3" columns? And this is repeated for every row in the original data?

        Hi, I have been out of town for awhile and did not check the forum. That is the way that I would like the to be split. Ideally, the file names would be in a sequential order to make it simple. As long as each column has a header, the file name does not matter. Yes, it will be repeated for every row in the original data. The row numbers change from file to file as well as the column numbers.

        The solutions offered by Marshall and Kevbot did the trick. Thank you for your assistance.