in reply to Re: Split a large text file by columns
in thread Split a large text file by columns

I will try and provide as much detail about the file and why I am trying to do this. The file is a set of wavelength measurements collected from about 200 individual plants. instrument measures a response for each of the wavelength between 300 nm to 1000 nm in different intervals. Usually, the file contains between 100 to 200 columns depending on the settings and 200 rows (each plant ID). There are about 20 files in total when measurements are completed. I nedd to separate each column and keep the first column(rownames) with each column. I will then analyze each of these files for genetic information for research.

The memory is of no issue since I can run it on a server. I agree, my application is odd for what perl is used for, but I started teaching myself to use it since I am working with large files and need an efficient way to process files. I have been able to write a couple scripts that has made a few tedious task efficient and less mistake prone. That's the gist of it.

Thank you again, Monks.

  • Comment on Re^2: Split a large text file by columns

Replies are listed 'Best First'.
Re^3: Split a large text file by columns
by kevbot (Vicar) on Apr 22, 2017 at 04:45 UTC

    Here is a way to perform what you described (with the help of the Data::Table and Path::Tiny cpan modules).

    I'm assuming that there was a typo in the data you provided, and I changed the name of the last column to vnir_7.

    I put the following tab-delimited data into a file called data.tsv,

    <GSOR> vnir_1 vnir_2 vnir_3 vnir_4 vnir_5 vnir_6 +vnir_7 310015 0.37042 0.36909 0.36886 0.36698 0.36615 0.364 +49 0.36404 310100 0.25889 0.25773 0.2569 0.25563 0.25565 0.2551 +1 0.25508 310134 0.26163 0.26149 0.26059 0.26034 0.2604 0.2598 + 0.26085 310167 0.23168 0.23031 0.23045 0.22822 0.2267 0.2257 +5 0.22453 310196 0.26995 0.26902 0.2685 0.26689 0.26624 0.2647 + 0.26461
    This script processes the data and creates the files,
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; use Path::Tiny; # Load the tsv file with a header my $dt = Data::Table::fromTSV('data.tsv', 1 ); # Get a Data::Table that contains only the first column my $names_dt = $dt->subTable( undef, [ '<GSOR>' ] ); my $n_col = $dt->nofCol; my @column_names = $dt->header; for( my $i = 1; $i <= $n_col - 1; ++$i ){ my $col_name = $column_names[ $i ]; my $col_dt = $dt->subTable( undef, [ $col_name ] ); my $new_dt = $names_dt->clone(); $new_dt->colMerge($col_dt); my $file_name = "file_$i.tsv"; my $fh = path($file_name)->openw_utf8; print {$fh} $new_dt->tsv; $fh->close; } exit;

      Hi Kevbot, thank you much. The code that you wrote works perfectly. I see that I still have a long way to go about learning perl. I did not think of the different types of modules that make such tasks possible. My next task is to learn about the modules that are most useful.

      My humblest thank you.