nanoplasmonic has asked for the wisdom of the Perl Monks concerning the following question:
I am using Perl to parse a number of tab delimited text files stored in a directory. The code extracts a particular column of data and stores it in a matrix. After all the files in the directory have been looped through, the contents of the matrix along with filenames are printed in an user-specified output file. I am fairly new to Perl and the code which I modified from some online source is given below. The directory with the new set of data has become very large (2.09 GB with 68,668 files) and now I am getting "out of memory' errors. So, I was wondering if it is possible to read the files one a time and print to an existing text file by appending new columns. I can already do this by appending the new data as a new line but this becomes hard to read in the data analysis programs.
#arguments with command to run must be Directory name, Column number, + Output file $delimiter = "\t"; #input name of the data directory from command line $dir = @ARGV[0]; #input number of the column to be read from command line. 0 for the fi +rst column $columnNo = @ARGV[1]; #input name of the output file from command line $outfile = @ARGV[2]; #reminder for format of command statement if(!$dir or (!$columnNo && $columnNo ne '0') ) { print "No input data directory or column number.\nUsage: perl my_perl. +pl dir columnNo outfile\n"; } #read the directory opendir(DIR, $dir) or die "can not open directory $dir\n"; while($name = readdir(DIR)) { #save data file names in an array, don't include . and .. push(@files, $name) if( !($name eq '.' || $name eq '..') ); } closedir(DIR) or die "can not close directory $dir\n"; #process data if($dir && ($columnNo || $columnNo eq '0') ) { #read files for($i = 0; $i < @files; $i++) { $infile = $dir.'/'.@files[$i]; #each individual file open(IN, $infile) or die "can not open $infile\n"; #row number, 0 for first row. Reset for each file $rowNo = 0; #read the file while( $line = <IN> ) { # get rid of the new line character, otherwise data in the last col +umn incorrect chomp($line); # split to put data in each row into an array @data = split(/$delimiter/, $line); # remember data in a "matrix". $datamatrix{$i, $rowNo} = @data[$columnNo]; # add 1 to row number $rowNo++; } close(IN) or die "can not close $infile\n"; } } # print results. if($outfile) { open (OUT, ">$outfile") or die "can not open $outfile\n"; # number of columns print OUT 'The number of columns is: ' . scalar @files . "\n"; # first row file names for($i = 0; $i < @files; $i++) { print OUT @files[$i]; print OUT "\t" if($i < @files -1); } print OUT "\n"; # data for($j = 0; $j < $rowNo; $j++) { for($i = 0; $i < @files; $i++) { print OUT $datamatrix{$i, $j}; print OUT "\t" if($i < @files -1); } print OUT "\n"; } close(OUT) or die "can not close $outfile\n"; } else { print 'The number of columns is: ' . scalar @files . "\n"; # first row file names for($i = 0; $i < @files; $i++) { print @files[$i]; print "\t" if($i < @files -1); } print "\n"; # data for($j = 0; $j < $rowNo; $j++) { for($i = 0; $i < @files; $i++) { print $datamatrix{$i, $j}; print "\t" if($i < @files -1); } print "\n"; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Adding columns in a loop to an existing file using Perl
by aaron_baugher (Curate) on Oct 24, 2013 at 05:53 UTC | |
|
Re: Adding columns in a loop to an existing file using Perl
by Anonymous Monk on Oct 24, 2013 at 02:36 UTC | |
|
Re: Adding columns in a loop to an existing file using Perl
by Lennotoecom (Pilgrim) on Oct 24, 2013 at 07:48 UTC |