Re^2: part - split up files according to column value

Hi, Could you give an example of how to run this. I am new to perl. Also how would you incorporate the Filecache in this example. I want to split a file based on the first column and save in file with the name as the name in the first column field without the quotations. ex. data:

"1", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"
"2", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"
"4", "This" , "is" , "test", "data"
"2", "This" , "is" , "test", "data"
"3", "This" , "is" , "test", "data"
[download]

would create four files named 1,2,3,4 with the data in it.

file 1: 
"1", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"
"1", "This" , "is" , "test", "data"

file 2:
"2", "This" , "is" , "test", "data"
"2", "This" , "is" , "test", "data"

file 3:
"3", "This" , "is" , "test", "data"

file 4:
"4", "This" , "is" , "test", "data"
[download]

It is large file so I need to use the Filecache Thanks For any help

Comment on Re^2: part - split up files according to column value Select or Download Code

Replies are listed 'Best First'.
Re^3: part - split up files according to column value by Corion (Patriarch) on Aug 26, 2008 at 13:31 UTC
You can start by telling us where you encounter problems and what difficulties you have incorporating FileCache into jdporter's code.	[reply]
Re^4: part - split up files according to column value by mick2020 (Novice) on Aug 26, 2008 at 14:37 UTC
I have the first part of the task completed i.e. sorting the files. Here is my code ///My code use FileCache maxOpen => 1000; //////////// # config: my $field = 0; my $sep = ","; ////MY code cacheout $mode, $path; $fh = cacheout $mode, $path; ///////// $, = $sep; $\ = $/; my %file; # { num, name, $fh } my $fnum = 1; while (<>) { chomp; my @c = split /$sep/o; my( $key, $num ) = defined $c[$field] ? ( $c[$field], $fnum++ ) : ( '(column not present)', 0 ); unless ( $file{$key}) { $nameF = $c[$field]; $nameF =~ s/"//g; $file{$key}{num} = $num; $file{$key}{name} = "out/".$nameF.$ARGV[0]; if(($file{$key}{num}) >1){ -f $file{$key}{name} and die "Sorry, '$file{$key}{name}' exists; won't clobber."; open $file{$key}{fh}, ">", $file{$key}{name} or die "Error opening '$file{$key}{name}' for write - $!"; }} print {$file{$key}{fh}} @c; } [download] The problem is the filecache. I am not familiar with perl so I am having problems with this part of code. I am getting error $ perl split.pl Input.csv .cvs Error opening '4444.cvs' for write - Too many open files at split.pl 39, <> line 817961. I have marked the my addition to jdporter's code I don't know $path and $mode are.	[reply] [d/l]
Re^5: part - split up files according to column value by Corion (Patriarch) on Aug 26, 2008 at 14:46 UTC
I read the ~~FileHandle~~FileCache documentation differently than you do. I think that you're basically supposed to replace your calls to open by calls to `cacheout`, that is, instead of `open $file{$key}{fh}, ...`, use: `$file{$key}{fh} = cacheout $file{$key}{name}` [download] But I haven't tested that. `$path` is the (path and) name of the output file, and `$mode` is the file mode (which is irrelevant to your needs). Update: kyle spotted a link to the wrong documentation.	[reply] [d/l] [select]
Re^5: part - split up files according to column value by mick2020 (Novice) on Aug 26, 2008 at 15:45 UTC
I have tried `use FileCache maxOpen => 10000;` [download] .. `open $file{$key}{fh}, ">", cacheout $file{$key}{name} or die` [download] But I get the error Too many open files at /usr/lib/perl5/5.10/ .... at line 408948 I have tried changing the value of maxOpen but this does nothing	[reply] [d/l] [select]
Re^6: part - split up files according to column value by Corion (Patriarch) on Aug 26, 2008 at 15:49 UTC
Re^7: part - split up files according to column value by Anonymous Monk on Aug 27, 2008 at 09:12 UTC
Re^4: part - split up files according to column value by mick2020 (Novice) on Aug 26, 2008 at 15:55 UTC
I have tried `use FileCache maxOpen => 10000;` [download] .. `open $file{$key}{fh}, ">", cacheout $file{$key}{name} or die` [download] But I get the error Too many open files at /usr/lib/perl5/5.10/ .... at line 408948 I have tried changing the value of maxOpen but this does nothing	[reply] [d/l] [select]