in reply to Re: part - split up files according to column value
in thread part - split up files according to column value

Hi, Could you give an example of how to run this. I am new to perl. Also how would you incorporate the Filecache in this example. I want to split a file based on the first column and save in file with the name as the name in the first column field without the quotations. ex. data:
"1", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" "2", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" "4", "This" , "is" , "test", "data" "2", "This" , "is" , "test", "data" "3", "This" , "is" , "test", "data"
would create four files named 1,2,3,4 with the data in it.
file 1: "1", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" "1", "This" , "is" , "test", "data" file 2: "2", "This" , "is" , "test", "data" "2", "This" , "is" , "test", "data" file 3: "3", "This" , "is" , "test", "data" file 4: "4", "This" , "is" , "test", "data"
It is large file so I need to use the Filecache Thanks For any help

Replies are listed 'Best First'.
Re^3: part - split up files according to column value
by Corion (Patriarch) on Aug 26, 2008 at 13:31 UTC

    You can start by telling us where you encounter problems and what difficulties you have incorporating FileCache into jdporter's code.

      I have the first part of the task completed i.e. sorting the files. Here is my code
      ///My code use FileCache maxOpen => 1000; //////////// # config: my $field = 0; my $sep = ","; ////MY code cacheout $mode, $path; $fh = cacheout $mode, $path; ///////// $, = $sep; $\ = $/; my %file; # { num, name, $fh } my $fnum = 1; while (<>) { chomp; my @c = split /$sep/o; my( $key, $num ) = defined $c[$field] ? ( $c[$field], $fnum++ ) : ( '(column not present)', 0 ); unless ( $file{$key}) { $nameF = $c[$field]; $nameF =~ s/"//g; $file{$key}{num} = $num; $file{$key}{name} = "out/".$nameF.$ARGV[0]; if(($file{$key}{num}) >1){ -f $file{$key}{name} and die "Sorry, '$file{$key}{name}' exists; won't clobber."; open $file{$key}{fh}, ">", $file{$key}{name} or die "Error opening '$file{$key}{name}' for write - $!"; }} print {$file{$key}{fh}} @c; }
      The problem is the filecache. I am not familiar with perl so I am having problems with this part of code.
      I am getting error $ perl split.pl Input.csv .cvs Error opening '4444.cvs' for write - Too many open files at split.pl 39, <> line 817961.
      I have marked the my addition to jdporter's code
      I don't know $path and $mode are.

        I read the FileHandleFileCache documentation differently than you do. I think that you're basically supposed to replace your calls to open by calls to cacheout, that is, instead of open $file{$key}{fh}, ..., use:

        $file{$key}{fh} = cacheout $file{$key}{name}

        But I haven't tested that. $path is the (path and) name of the output file, and $mode is the file mode (which is irrelevant to your needs).

        Update: kyle spotted a link to the wrong documentation.

        I have tried
        use FileCache maxOpen => 10000;
        ..
        open $file{$key}{fh}, ">", cacheout $file{$key}{name} or die
        But I get the error
        Too many open files at /usr/lib/perl5/5.10/ .... at line 408948
        I have tried changing the value of maxOpen but this does nothing
      I have tried
      use FileCache maxOpen => 10000;
      ..
      open $file{$key}{fh}, ">", cacheout $file{$key}{name} or die
      But I get the error
      Too many open files at /usr/lib/perl5/5.10/ .... at line 408948
      I have tried changing the value of maxOpen but this does nothing