I am using Perl to parse a number of tab delimited text files stored in a directory. The code extracts a particular column of data and stores it in a matrix. After all the files in the directory have been looped through, the contents of the matrix along with filenames are printed in an user-specified output file. I am fairly new to Perl and the code which I modified from some online source is given below. The directory with the new set of data has become very large (2.09 GB with 68,668 files) and now I am getting "out of memory' errors. So, I was wondering if it is possible to read the files one a time and print to an existing text file by appending new columns. I can already do this by appending the new data as a new line but this becomes hard to read in the data analysis programs.

#arguments with command to run must be Directory name, Column number, + Output file $delimiter = "\t"; #input name of the data directory from command line $dir = @ARGV[0]; #input number of the column to be read from command line. 0 for the fi +rst column $columnNo = @ARGV[1]; #input name of the output file from command line $outfile = @ARGV[2]; #reminder for format of command statement if(!$dir or (!$columnNo && $columnNo ne '0') ) { print "No input data directory or column number.\nUsage: perl my_perl. +pl dir columnNo outfile\n"; } #read the directory opendir(DIR, $dir) or die "can not open directory $dir\n"; while($name = readdir(DIR)) { #save data file names in an array, don't include . and .. push(@files, $name) if( !($name eq '.' || $name eq '..') ); } closedir(DIR) or die "can not close directory $dir\n"; #process data if($dir && ($columnNo || $columnNo eq '0') ) { #read files for($i = 0; $i < @files; $i++) { $infile = $dir.'/'.@files[$i]; #each individual file open(IN, $infile) or die "can not open $infile\n"; #row number, 0 for first row. Reset for each file $rowNo = 0; #read the file while( $line = <IN> ) { # get rid of the new line character, otherwise data in the last col +umn incorrect chomp($line); # split to put data in each row into an array @data = split(/$delimiter/, $line); # remember data in a "matrix". $datamatrix{$i, $rowNo} = @data[$columnNo]; # add 1 to row number $rowNo++; } close(IN) or die "can not close $infile\n"; } } # print results. if($outfile) { open (OUT, ">$outfile") or die "can not open $outfile\n"; # number of columns print OUT 'The number of columns is: ' . scalar @files . "\n"; # first row file names for($i = 0; $i < @files; $i++) { print OUT @files[$i]; print OUT "\t" if($i < @files -1); } print OUT "\n"; # data for($j = 0; $j < $rowNo; $j++) { for($i = 0; $i < @files; $i++) { print OUT $datamatrix{$i, $j}; print OUT "\t" if($i < @files -1); } print OUT "\n"; } close(OUT) or die "can not close $outfile\n"; } else { print 'The number of columns is: ' . scalar @files . "\n"; # first row file names for($i = 0; $i < @files; $i++) { print @files[$i]; print "\t" if($i < @files -1); } print "\n"; # data for($j = 0; $j < $rowNo; $j++) { for($i = 0; $i < @files; $i++) { print $datamatrix{$i, $j}; print "\t" if($i < @files -1); } print "\n"; } }

In reply to Adding columns in a loop to an existing file using Perl by nanoplasmonic

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.