Re: opening many files
by Marshall (Canon) on Feb 26, 2012 at 02:33 UTC
|
I would make an ArrayOfArray to keep the results. I would make a loop to open each file, read it line by line, use column 1 as the index to the ArrayOfArray and push col3 onto the appropriate array specified by the array index and then loop to the next file. Once you are done, the ArrayOfArray can be printed to a new file.
What code have you written so far? What are the problems? A loop is the appropriate answer for repetitive operations like this. The data will fit into memory at once and only one file at a time needs to be open.
Update: I think something like this would work:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my $datafile1=<<END;
1 has 0.334
2 has 0.986
3 has 0.787
END
my $datafile2=<<END;
1 has 0.894
2 has 0.586
3 has 0.187
END
my @data;
foreach my $fileRef (\$datafile1, \$datafile2)
{
open FILE, '<', $fileRef or die "$!";
while (<FILE>)
{
my ($row, $col3) = (split)[0,2];
push @{$data[--$row]}, $col3;
}
}
my $row_num=1;
foreach my $row (@data)
{
print $row_num++, " has ", "@$row\n";
}
__END__
1 has 0.334 0.894
2 has 0.986 0.586
3 has 0.787 0.187
Of course instead of using references to files, you would need to use some form of glob() or readdir() to get the file names. And of course the data that was presented is not a CSV file so, something would have to be done about that. | [reply] [d/l] |
Re: opening many files
by jwkrahn (Monsignor) on Feb 26, 2012 at 03:51 UTC
|
UNTESTED, but this may work:
@ARGV = glob 'datafile*';
open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file'
+because: $!";
while ( <> ) {
if ( $. == 1 ) {
print $OUTPUT $ARGV =~ /(\d+)$/, " has";
}
print $OUTPUT " ", ( split )[ 2 ];
if ( eof ) {
print $OUTPUT " $. columns\n";
close ARGV;
}
}
| [reply] [d/l] |
|
thank you for reply. it is working file except i want 948 rows and 200 columns from 200 files.
the program is giving me 200 rows and 948 columns.
thank you.
| [reply] |
|
@ARGV = glob 'datafile*';
open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file'
+because: $!";
my %data;
while ( <> ) {
next unless /^(.+) (\S+)$/;
push @{ $data{ $1 } }, $2;
}
for my $key ( sort keys %data ) {
print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key }
+}, "columns\n";
}
| [reply] [d/l] |
|
|
|
|
|
@ARGV = glob 'datafile*';
open my $OUTPUT, '>', 'output_file' or die "Cannot open 'output_file'
+because: $!";
my %data;
while ( <> ) {
next unless /^(.+) (\S+)$/;
push @{ $data{ $1 } }, $2;
}
for my $key ( sort keys %data ) {
print join ' ', $key, @{ $data{ $key } }, scalar @{ $data{ $key }
+}, "columns\n";
}
========================
The program you have given works fantastic. But I have to deal with one more thing...Here it is.
I have files by name datafile1,datafile2,datafile3,datafile4.........datafile200 in a directory.
But these files are stored in a a different pattern ...datafile1,datafile10,datafile11,datafile12 and so on. The above program is reading and writing in the same order.
I would like to pick data from in the an order from datafile1, 2, 3 to data200.
I guess I need to sort.
Any suggestions and additions are appreciated.
Thank you. | [reply] [d/l] |
|
That's because glob returns the results of datafile* interpolated in the order a shell would return them, so datafile11 comes before datafile2, as you've seen. A quick and dirty solution, if you know you're only dealing with up to 3 digits:
@ARGV = glob 'datafile? datafile?? datafile???';
A more general solution that'll sort on any number of digits:
@ARGV = sort { my $an = substr $a, 8; my $bn = substr $b, 8; $an <=> $
+bn } glob 'datafile*';
Aaron B.
Available for small or large Perl jobs; see my home node.
| [reply] [d/l] [select] |
|
| [reply] |
Re: opening many files
by aaron_baugher (Curate) on Feb 26, 2012 at 08:06 UTC
|
Sure, you can open them in a loop. You could put your file descriptors in an array, and then loop through them as you print each line. As long as your system will let a process have that many open files, something like this should work (untested). It's not at all flexible, but since you seem to know that all your files have the same number of lines and the same format, it doesn't need to be. Also, if you use a CSV module, you may want to make an array of object references rather than simple file descriptors, but the looping concept would be the same.
my @fds;
for (1..200){
open my $fds[$_], '<', "datafile$_" or die $!;
}
for my $ln (1..948){
print "$ln has ";
for my $fdn (1..200){
my $line = <$fds[$fdn]>;
my $field3 = get_third_field_using_whatever_csv_method($line);
print $field3;
print ' ' unless $fdn == 200;
}
print "\n";
}
| [reply] [d/l] |
|
open my $fds[$_], '<', "datafile$_" or die $!;
You can't use my on an array element.
my $line = <$fds[$fdn]>;
From I/O Operators:
If what's within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. This distinction is determined on syntactic grounds alone. That means "<$x>" is always a readline() from an indirect handle, but "<$hash{key}>" is always a glob(). That's because $x is a simple scalar variable, but $hash{key} is not--it's a hash element. Even "<$x >" (note the extra space) is treated as "glob("$x ")", not "readline($x)".
| [reply] [d/l] [select] |
|
| [reply] |
Re: opening many files
by Anonymous Monk on Feb 26, 2012 at 03:06 UTC
|
| [reply] |
Re: opening many files
by umasuresh (Hermit) on Feb 26, 2012 at 02:27 UTC
|
Hi wanttoprogram,
Here is a good place to start learning:
Perl tutorials
Hint: use a Hash! | [reply] |
|
Why on earth do you suggest using a hash? The files have an order, the OP wants to keep the order. It all screams array.
| [reply] |