I've also looked at Tie::Handle::CSV and Text::CSV, but these modules seem to only process a file line-wise, not column-wise, which considering the size of my files would be quite inefficient and complex (once the column header is read, this is all the information necessary to determine where to copy the entire column to).

There is no mechanism for reading a column from a file without reading the file line by line. That's just the way files work.

But, line-by-line processing of files is perfectly efficient. Provided that you do not have to re-process each line for each column. That means placing all the fields from the first line into their respective files, before reading and processing the second line.

This makes a lot of assumptions about the formatting of your ids and data files, but may serve to illustrate the technique even if you need to use one of the bastardised csv format processors.

Update: reversing the %ids array -- ie. using the filenos as the keys and pushing the field nos to an array as the value would save having to grep the hash 4 times for every record.

This is untested beyond basic syntax checking:

#! perl -slw use strict; use Data::Dump qw[ pp ]; ## Assumes that the IDs file consists of space separated lines ## zero-based-column-no-in-data-file zero-based-fileno-destination open IDS, '<', 'ids.map' or die $!; my %ids = map{ my( $columnNo, $fileNo ) = split; $columnNo -= 10; ## adjust column numbers ( $columnNo, $fileNo ); }<IDS>; close IDS; chomp %ids; ## for each data filenmae supplied on the command line for my $filename ( @ARGV ) { ## open that file for input open IN, '<',.$filename or die $!; ## open 4 output files named as $filename.out.n my @outs; open $outs[ $_ ], '>', "$filename.out.$_" for 0 .. 3; ## read the data file line by line while( <IN> ) { ## split the line into fields -- assumes sane csv definition my @fields = split '\s*,\s*', $_; ## print the first 10 fields to each of the 4 files ## and remove them from the @fields array printf { $outs[ $_ ] } "%s, ", join ', ', splice @fields[ 0 .. 9 ] for 0 .. 3; splice @fields, 0, 9; ## for each of the output files for my $fileNo ( 0 .. 3 ) { ## print those fields ... print { $outs[ $fileNo ] } join ', ', @fields[ ## that are mapped to this file grep{ $ids{ $_ } == $fileNo } 0 .. $#fields ]; } } ## cleanup close $outs[ $_ ] for 0 .. 3; close IN; }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?


In reply to Re: Processing files column-wise by BrowserUk
in thread Processing files column-wise by iangibson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.