Re: Is this possible
by pc88mxer (Vicar) on Mar 24, 2008 at 02:12 UTC
|
Untested, but this should work:
my @files = @ARGV;
my %table;
while (<>) {
my ($name, $value) = split(' ', $_);
$table{$name}->{$ARGV} = $value;
}
print join("\t", "NAME", @files), "\n";
for my $n (keys %table) {
print $n;
for my $f (@files) {
print "\t", $table{$name}->{$f};
}
print "\n";
}
Then invoke it with:
perl script-name RESULTS_FILE*
| [reply] [d/l] [select] |
|
|
It seems to work but it gets a bit messy with the filenames (it prints one over the other, not correctly separated with tabs)...
The numbers seem to print ok though...
| [reply] |
|
|
Are you sure you have this line entered correctly?
print join("\t", "NAME", @files), "\n";
That's the code responsible for printing the header line, so that's where I'd look. This will print tab characters in between each file name. What you are seeing might be an artifact of your text editor or the way your are viewing the file.
| [reply] [d/l] |
Re: Is this possible
by davido (Cardinal) on Mar 24, 2008 at 03:20 UTC
|
There's not really any "correct" way. ...or maybe it's more accurate to say, there's no wrong way as long as it works and doesn't gobble all your memory in the process. ;)
Here is yet another way to do it:
use strict;
use warnings;
my @files = @ARGV;
my %table;
foreach my $filename (@files) {
open my $infile, '<', $filename or die $!;
while( <$infile> ) {
chomp;
next unless $_;
my( $name, $val ) = split;
push( @{$table{ $name }}, $val );
}
}
my @names = sort keys %table;
my $output = join "\t", "NAME\t", @files;
foreach my $entry ( @names ) {
$output .= join "\t\t", "\n$entry", @{$table{$entry}};
}
$output .= map {
local $" = "\t\t";
"@{$table{$_}}\n";
} @names;
print $output, "\n";
Usage: perl scriptname filename1 filename2
It doesn't bother to check to ensure that all files contain the proper number of entries. That could cause your columns to become misaligned, if for example, the second file doesn't contain an entry for Georgia.
| [reply] [d/l] |
Re: Is this possible
by NetWallah (Canon) on Mar 24, 2008 at 03:22 UTC
|
use strict;
my %info;
for (1..50){
process_file($_);
}
print "NAME\t";
print "$_\t" for (1..50);
print "\n";
for my $k (sort keys %info){
print "$k\t";
print "$info{$k}->[$_]\t" for (1..50);
print "\n";
}
sub process_file{
my $filenbr = shift;
open my $f, "<" , "RESULT_FILE$filenbr" or die "Cannot read file $_\
+n$!";
while(<$f>){
my ($name,$val) = split;
$info{$name}->[$filenbr] = $val;
}
close $f;
}
"As you get older three things happen. The first is your memory goes, and I can't remember the other two... "
- Sir Norman Wisdom
| [reply] [d/l] |
Re: Is this possible
by jwkrahn (Abbot) on Mar 24, 2008 at 06:28 UTC
|
The main problem with the first two solutions presented is that they rely on the glob sort order so that 'RESULTS_FILE2' will be between 'RESULTS_FILE19' and 'RESULTS_FILE20', etc.
This will glob the files in the correct order:
#!/usr/bin/perl
use warnings;
use strict;
@ARGV = map glob( 'RESULTS_FILE' . ( '[0-9]' x $_ ) ), 1 .. 4;
my @headers = ( 'NAME', @ARGV );
my %data;
while ( <> ) {
my ( $name, $value ) = split;
push @{ $data{ $name } }, $value;
}
print join( "\t", @headers ), "\n";
for my $name ( keys %data ) {
print join( "\t", $name, @{ $data{ $name } } ), "\n";
}
| [reply] [d/l] [select] |
|
|
Good point - depending on the OP's needs I can see where this would be useful.
One issue I have with the solutions which use push is that if a file doesn't contain a name (or contains a name not found in other files), then the outputted data is not going to be lined up correctly. When implemented with a hash, the script can be used on non-uniform data files, and it makes it easy to spot data files which don't have the same names as the others (i.e. missing data.)
| [reply] [d/l] |
Re: Is this possible
by BrowserUk (Patriarch) on Mar 24, 2008 at 08:20 UTC
|
perl
-e"BEGIN{@ARGV=map qq[RESULTS_FILE.$_],1..50}"
-nle"$h{$_->[0]}.=qq[\t$_->[1]]for[split]}{print qq[@$_]while@{$_=[eac
+h%h]}"
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
You don't need a seperate -e for the BEGIN (saves 5 chars)
| [reply] |
Re: Is this possible
by locked_user sundialsvc4 (Abbot) on Mar 24, 2008 at 15:54 UTC
|
For the record, if you do have enormous amounts of information that you need to, say, “cross-tabulate” in this way, you can accomplish the task for arbitrary amounts of data by: - Dump the data into a temporary flat-file, as tuples like {'Kostas', 'RESULTS_FILE2', 3.1415926}.
- Sort the file, using an external disk sort such as the venerable sort command or an equivalent CPAN module. Sort it by two keys: first by the 'Kostas' field and then (i.e. within 'Kostas') by 'RESULTS_FILE2.'
- Now process the sorted file. Everything for 'Kostas' will be together. Everything for Kostas' columns will be already-arranged in ascending column order. Anything that's not in the right place does not exist, period. Virtually no memory is required.
Sorting is an “unexpectedly fast” algorithm that, for large amounts of data, consistently beats-the-pants off of any random-access algorithm. When you saw all those tapes spinning back and forth (actually, if you watched very closely, the tapes never changed direction...) that's what the computers were doing. Back before computers existed, that's also how it was done ... with punched cards. The same principles still apply. They still work like nothing else does.
| |
Re: Is this possible
by radiantmatrix (Parson) on Mar 24, 2008 at 15:11 UTC
|
use strict; use warnings;
use Text::CSV_XS;
use IO::File;
use File::Find;
my (%data, @file);
my $in_csv = Text::CSV_XS->new({ sep_char => " " }); #space-sep in
my $out_csv = Text::CSV_XS->new({ sep_char => "\x09" }); #tab-sep out
my $out = IO::File->new('report.tab','>')
or die "Can't open report.tab for writing: $!\n";
find(\&parse_file, '.'); #find files, running parse_file() for each
$out_csv->print($out, [ 'Name', @file ]);
## write out report
foreach my $name (sort keys %data) {
my @row = ($name);
foreach (@file) {
push @row, $data{$name}{$_};
}
$out_csv->print($out, \@row);
}
sub parse_file {
# only deal with *files* that match naming convention
return unless -f $_ && m/RESULTS_FILE\.(\d+)/;
my $file_number = $1; # captured number from match
my $filename = $_;
push @file, $filename;
my $io = IO::File->new($filename,'<') or do {
warn "Can't open $filename: $!\n";
return;
};
while (my $row = $in_csv->getline($io)) {
# e.g. first row in your example results in:
# $data{Kostas}{RESULT_FILE.1} = Number1
$data{ $row->[0] }{ $filename } = $row->[1];
}
}
I think you'll find that very fast and maintainable.
<–radiant.matrix–>
Ramblings and references
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet
| [reply] [d/l] |