in reply to Processing files column-wise
I've also looked at Tie::Handle::CSV and Text::CSV, but these modules seem to only process a file line-wise, not column-wise, which considering the size of my files would be quite inefficient and complex (once the column header is read, this is all the information necessary to determine where to copy the entire column to).
There is no mechanism for reading a column from a file without reading the file line by line. That's just the way files work.
But, line-by-line processing of files is perfectly efficient. Provided that you do not have to re-process each line for each column. That means placing all the fields from the first line into their respective files, before reading and processing the second line.
This makes a lot of assumptions about the formatting of your ids and data files, but may serve to illustrate the technique even if you need to use one of the bastardised csv format processors.
Update: reversing the %ids array -- ie. using the filenos as the keys and pushing the field nos to an array as the value would save having to grep the hash 4 times for every record.
This is untested beyond basic syntax checking:
#! perl -slw use strict; use Data::Dump qw[ pp ]; ## Assumes that the IDs file consists of space separated lines ## zero-based-column-no-in-data-file zero-based-fileno-destination open IDS, '<', 'ids.map' or die $!; my %ids = map{ my( $columnNo, $fileNo ) = split; $columnNo -= 10; ## adjust column numbers ( $columnNo, $fileNo ); }<IDS>; close IDS; chomp %ids; ## for each data filenmae supplied on the command line for my $filename ( @ARGV ) { ## open that file for input open IN, '<',.$filename or die $!; ## open 4 output files named as $filename.out.n my @outs; open $outs[ $_ ], '>', "$filename.out.$_" for 0 .. 3; ## read the data file line by line while( <IN> ) { ## split the line into fields -- assumes sane csv definition my @fields = split '\s*,\s*', $_; ## print the first 10 fields to each of the 4 files ## and remove them from the @fields array printf { $outs[ $_ ] } "%s, ", join ', ', splice @fields[ 0 .. 9 ] for 0 .. 3; splice @fields, 0, 9; ## for each of the output files for my $fileNo ( 0 .. 3 ) { ## print those fields ... print { $outs[ $fileNo ] } join ', ', @fields[ ## that are mapped to this file grep{ $ids{ $_ } == $fileNo } 0 .. $#fields ]; } } ## cleanup close $outs[ $_ ] for 0 .. 3; close IN; }
|
|---|