in reply to Split a file based on column
I would do something like this (untested):
use strict; use warnings; use autodie; use constant IN_FN => 'sample_1.txt'; my %handles; open my $infh, '<', IN_FN; while( <$infh> ) { my( $key ) = m/^[^|]\|([^|]+)/; if( ! defined $key ) { warn "Line $. appears malformed. Skipping: $_"; next; } open $handles{$key}, '>', IN_FN . "$key.txt" unless exists $handles{$key}; print {$handles{$key}} $_; } close $_ for $infh, values %handles;
You didn't mention the need, but it would be pretty easy to adapt this to work with a list of input files. Just replace the constant with code to deal with different input filenames, and put it in a loop. :)
What I like about this solution is that you only open each output file once, and then just keep track of the file handles as values in a hash, indexed on the key parsed from the 2nd column.
Update: This solution has the efficiency advantage of not having to re-open an output file if it's already been opened before. But johngg correctly observed that at some point it's possible to get a "Too many open files" error. On one of my systems that kicked in after trying to open 1020 files simultaneously. My solution assumes that column two holds two digits, which would yield just under 100 possible output files. That should be ok.
However, if it turns out that you're exceeding the number of allowable open files on your system, you can open/close on each iteration (the simplest solution).
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Split a file based on column
by roboticus (Chancellor) on Jan 17, 2013 at 04:19 UTC | |
by davido (Cardinal) on Jan 17, 2013 at 06:11 UTC | |
by roboticus (Chancellor) on Jan 17, 2013 at 11:35 UTC | |
by Anonymous Monk on Jan 17, 2013 at 10:56 UTC | |
by davido (Cardinal) on Jan 17, 2013 at 18:04 UTC | |
by Anonymous Monk on Jan 18, 2013 at 00:59 UTC |