Re: Processing files column-wise

I've also looked at Tie::Handle::CSV and Text::CSV, but these modules seem to only process a file line-wise, not column-wise, which considering the size of my files would be quite inefficient and complex (once the column header is read, this is all the information necessary to determine where to copy the entire column to).

There is no mechanism for reading a column from a file without reading the file line by line. That's just the way files work.

But, line-by-line processing of files is perfectly efficient. Provided that you do not have to re-process each line for each column. That means placing all the fields from the first line into their respective files, before reading and processing the second line.

This makes a lot of assumptions about the formatting of your ids and data files, but may serve to illustrate the technique even if you need to use one of the bastardised csv format processors.

Update: reversing the %ids array -- ie. using the filenos as the keys and pushing the field nos to an array as the value would save having to grep the hash 4 times for every record.

This is untested beyond basic syntax checking:

#! perl -slw
use strict;
use Data::Dump qw[ pp ];

## Assumes that the IDs file consists of space separated lines
## zero-based-column-no-in-data-file zero-based-fileno-destination
open IDS, '<', 'ids.map' or die $!;
my %ids = map{
    my( $columnNo, $fileNo ) = split;
    $columnNo -= 10; ## adjust column numbers
    ( $columnNo, $fileNo );
}<IDS>;
close IDS;
chomp %ids;

## for each data filenmae supplied on the command line
for my $filename ( @ARGV ) {

    ## open that file for input
    open IN, '<',.$filename or die $!;

    ## open 4 output files named as $filename.out.n
    my @outs;
    open $outs[ $_ ], '>', "$filename.out.$_" for 0 .. 3;

    ## read the data file line by line
    while( <IN> ) {

        ## split the line into fields -- assumes sane csv definition
        my @fields = split '\s*,\s*', $_;

        ## print the first 10 fields to each of the 4 files
        ## and remove them from the @fields array
        printf { $outs[ $_ ] } "%s, ",
            join ', ', splice @fields[ 0 .. 9 ]
            for 0 .. 3;

        splice @fields, 0, 9;

        ## for each of the output files
        for my $fileNo ( 0 .. 3 ) {

            ## print those fields ...
            print { $outs[ $fileNo ] } join ', ', @fields[

                ## that are mapped to this file
                grep{
                    $ids{ $_ } == $fileNo
                } 0 .. $#fields
            ];
        }
    }
    ## cleanup
    close $outs[ $_ ] for 0 .. 3;
    close IN;
}
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Comment on Re: Processing files column-wise Download Code