opening many files

wanttoprogram has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: opening many files by Marshall (Canon) on Feb 26, 2012 at 02:33 UTC
I would make an ArrayOfArray to keep the results. I would make a loop to open each file, read it line by line, use column 1 as the index to the ArrayOfArray and push col3 onto the appropriate array specified by the array index and then loop to the next file. Once you are done, the ArrayOfArray can be printed to a new file. What code have you written so far? What are the problems? A loop is the appropriate answer for repetitive operations like this. The data will fit into memory at once and only one file at a time needs to be open. Update: I think something like this would work: `#!/usr/bin/perl -w use strict; use Data::Dumper; my $datafile1=<<END; 1 has 0.334 2 has 0.986 3 has 0.787 END my $datafile2=<<END; 1 has 0.894 2 has 0.586 3 has 0.187 END my @data; foreach my $fileRef (\$datafile1, \$datafile2) { open FILE, '<', $fileRef or die "$!"; while (<FILE>) { my ($row, $col3) = (split)[0,2]; push @{$data[--$row]}, $col3; } } my $row_num=1; foreach my $row (@data) { print $row_num++, " has ", "@$row\n"; } __END__ 1 has 0.334 0.894 2 has 0.986 0.586 3 has 0.787 0.187` [download] Of course instead of using references to files, you would need to use some form of glob() or readdir() to get the file names. And of course the data that was presented is not a CSV file so, something would have to be done about that.	[reply] [d/l]
Re: opening many files by jwkrahn (Abbot) on Feb 26, 2012 at 03:51 UTC
UNTESTED, but this may work: `@ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; while ( <> ) { if ( $. == 1 ) { print $OUTPUT $ARGV =~ /(\d+)$/, " has"; } print $OUTPUT " ", ( split )[ 2 ]; if ( eof ) { print $OUTPUT " $. columns\n"; close ARGV; } }` [download]	[reply] [d/l]
Re^2: opening many files by wanttoprogram (Novice) on Feb 27, 2012 at 21:24 UTC
thank you for reply. it is working file except i want 948 rows and 200 columns from 200 files. the program is giving me 200 rows and 948 columns. thank you.	[reply]
Re^3: opening many files by jwkrahn (Abbot) on Feb 27, 2012 at 23:48 UTC
`@ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; my %data; while ( <> ) { next unless /^(.+) (\S+)$/; push @{ $data{ $1 } }, $2; } for my $key ( sort keys %data ) { print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key } +}, "columns\n"; }` [download]	[reply] [d/l]
Re^4: opening many files by wanttoprogram (Novice) on Feb 28, 2012 at 04:44 UTC
Re^5: opening many files by Anonymous Monk on Feb 28, 2012 at 05:08 UTC
Some notes below your chosen depth have not been shown here
Re^4: opening many files by wanttoprogram (Novice) on Feb 28, 2012 at 16:32 UTC
Re^2: opening many files by wanttoprogram (Novice) on May 22, 2012 at 22:05 UTC
`@ARGV = glob 'datafile*'; open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file' +because: $!"; my %data; while ( <> ) { next unless /^(.+) (\S+)$/; push @{ $data{ $1 } }, $2; } for my $key ( sort keys %data ) { print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key } +}, "columns\n"; }` [download] ======================== The program you have given works fantastic. But I have to deal with one more thing...Here it is. I have files by name datafile1,datafile2,datafile3,datafile4.........datafile200 in a directory. But these files are stored in a a different pattern ...datafile1,datafile10,datafile11,datafile12 and so on. The above program is reading and writing in the same order. I would like to pick data from in the an order from datafile1, 2, 3 to data200. I guess I need to sort. Any suggestions and additions are appreciated. Thank you.	[reply] [d/l]
Re^3: opening many files by aaron_baugher (Curate) on May 23, 2012 at 00:59 UTC
That's because `glob` returns the results of `datafile` interpolated in the order a shell would return them, so `datafile11` comes before `datafile2`, as you've seen. A quick and dirty solution, if you know you're only dealing with up to 3 digits: `@ARGV = glob 'datafile? datafile?? datafile???';` [download] A more general solution that'll sort on any number of digits: `@ARGV = sort { my $an = substr $a, 8; my $bn = substr $b, 8; $an <=> $ +bn } glob 'datafile';` [download] Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]
Re^3: opening many files by Anonymous Monk on May 22, 2012 at 22:49 UTC
http://learn.perl.org/faq/perlfaq4.html#How-do-I-sort-an-array-by-anything- http://perldoc.perl.org/perlfaq4.html#How-do-I-sort-an-array-by-%28anything%29%3f Sorting ip addresses quickly	[reply]
Re: opening many files by aaron_baugher (Curate) on Feb 26, 2012 at 08:06 UTC
Sure, you can open them in a loop. You could put your file descriptors in an array, and then loop through them as you print each line. As long as your system will let a process have that many open files, something like this should work (untested). It's not at all flexible, but since you seem to know that all your files have the same number of lines and the same format, it doesn't need to be. Also, if you use a CSV module, you may want to make an array of object references rather than simple file descriptors, but the looping concept would be the same. `my @fds; for (1..200){ open my $fds[$_], '<', "datafile$_" or die $!; } for my $ln (1..948){ print "$ln has "; for my $fdn (1..200){ my $line = <$fds[$fdn]>; my $field3 = get_third_field_using_whatever_csv_method($line); print $field3; print ' ' unless $fdn == 200; } print "\n"; }` [download] Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l]
Re^2: opening many files by jwkrahn (Abbot) on Feb 26, 2012 at 23:23 UTC
`open my $fds[$_], '<', "datafile$_" or die $!;` [download] You can't use my on an array element. `my $line = <$fds[$fdn]>;` [download] From I/O Operators: If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means "<$x>" is always a readline() from an indirect handle, but "<$hash{key}>" is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even "<$x >" (note the extra space) is treated as "glob("$x ")", not "readline($x)".	[reply] [d/l] [select]
Re^3: opening many files by aaron_baugher (Curate) on Feb 27, 2012 at 01:18 UTC
Well, I did say it was untested, but that's a poor excuse. ++ for the correction; my 'my' should be removed on that line. Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply]
Re: opening many files by Anonymous Monk on Feb 26, 2012 at 03:06 UTC
opening files -> Opening multiple files Beginning Perl (free) Chapter 6: Files and Data, Learn Perl in about 2 hours 30 minutes	[reply]
Re: opening many files by umasuresh (Hermit) on Feb 26, 2012 at 02:27 UTC
Hi wanttoprogram, Here is a good place to start learning: Perl tutorials Hint: use a Hash!	[reply]
Re^2: opening many files by JavaFan (Canon) on Feb 26, 2012 at 10:13 UTC
Why on earth do you suggest using a hash? The files have an order, the OP wants to keep the order. It all screams array.	[reply]


Welcome to the Monastery
	PerlMonks