in reply to Re: Split a large text file by columns
in thread Split a large text file by columns

My apologies about the previous submit. Hit the create button mistakenly. It is formatted better now.

Thank you Anonymous Monk. The column numbers vary by file. I have below a perl script where I first split the file using awk in bash then try to merge each file to a file with the row names. There is also another file that I used that split the file on tab across each row then place the first element of array in a scalar variable. Then I tried adding the first element and the remaining elements to a hash

The first code

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use Getopt::Std; #reading options our ($opt_i); getopts('i:'); if (!$opt_i) { print STDERR "\nInput file name (-i) required\n\n\n"; } #open the file or die open INFILE, "<", $opt_i or die "No such input file $opt_i"; while (<INFILE>) { chomp; my @fh = <INFILE>; my @fh = split ('\t', $_); #print "@fh"; my $geno = shift @fh; #open directory. the "." represents current directory opendir(DIR, "."); #place all files in an array #@files = readdir(DIR); my @files = glob("*.txt"); #close directory closedir(DIR); my @merge; #process each file foreach my $file (@files) { open FILE, "<", $file; while(my @line = <FILE>) { foreach my $comb (@line) { print "$geno\t"."$comb"; close FILE; } } } } close INFILE;

The second code, which is splitting the file

#!/usr/bin/perl -w use strict; use warnings; use diagnostics; use Getopt::Std; #reading options our ($opt_i); getopts('i:'); if (!$opt_i) { print STDERR "\nInput file name (-i) required\n\n\n"; } #open the file or die open INFILE, "<", $opt_i or die "No such input file $opt_i"; my %file; while (<INFILE>) { #remove the newline character chomp; #create an array my @fh = <INFILE>; #split the file by tabs in each row @fh = split ('\t', $_); #place the first column (row names) in $geno my $geno = shift @fh; my $remain = join "_", @fh[1-]; push @{$geno{$remain}}, $_; } # print the first field (the username) print "%file\n"; } close INFILE;

Replies are listed 'Best First'.
Re^4: Split a large text file by columns
by Anonymous Monk on Apr 22, 2017 at 05:54 UTC

    I see what you're trying to do, almost

    How do you decide what filename gets what columns?

    Based on your sample data, these are two rows (headers then rows) that go into three files , this is how you're wanting to split that up

    <GSOR> vnir_1 vnir_2 vnir_3 <GSOR> vnir_4 vnir_5 vnir_6 <GSOR> vnir_7 vnir_8 vnir_9

    This is the first data row split up and ready to end up in three different files

    310015 0.37042 0.36909 0.36886 310015 0.36698 0.36615 0.36449 310015 0.36404

    What is the filename for the first 3 columns? Second 3 columns? Last "3" columns? And this is repeated for every row in the original data?

      Hi, I have been out of town for awhile and did not check the forum. That is the way that I would like the to be split. Ideally, the file names would be in a sequential order to make it simple. As long as each column has a header, the file name does not matter. Yes, it will be repeated for every row in the original data. The row numbers change from file to file as well as the column numbers.

      The solutions offered by Marshall and Kevbot did the trick. Thank you for your assistance.