Re: join mutiple files on first column

So, now that the OP has been updated with some code (and PLEASE UPDATE IT AGAIN, to fix the formatting -- <p>...</p> around descriptions and questions, <c>...</c> around code and data)...

I get the impression that you aren't really interested in learning to program in Perl -- so go ahead and just use R, no problem there.

If you actually do want to learn perl, you might try the algorithm I proposed in my earlier reply, instead of copying code that someone else wrote and that you don't understand.

It's not hard to write a working perl script based on a decent pseudo-code description -- you just have to settle a few details, like "where does the list of file names come from?"

Here's an example that assumes the list of file names comes from command line args (which end up in @ARGV) -- that is, you would invoke the script like this:

 name_of_script  *.txt  > all-txt-files.joined
[download]

That assumes that your 30 text files are all in the current working directory, and you are able to create a new file in that directory. The following example adds a few extra steps that weren't covered in my earlier post:

#!/usr/bin/perl

use strict;
use warnings;

# get the list of file names -- actually, just make sure @ARGV has the
+m:
# declare a hash for output

die "Usage: $0 *.txt > text.joined\n" unless ( @ARGV and -f $ARGV[0] )
+;
my %output;

# extra step: declare an array to preserve original order of keys in f
+irst file
my @output_order;

# open the first file
# while reading each line from the file
#    get the first column of the line for use as a hash key
#    assign the line as the value of hash element using that key

open( IN, '<', $ARGV[0] ) or die "Can't read $ARGV[0]: $!\n";
while (<IN>) {
    my ( $key ) = ( /^(\S+)/ );

#  extra steps: add key to order array, turn EOL whitespace into tab c
+haracter
    push @output_order, $key;
    s/\s+$/\t/;

    $output{$key} = $_;
}

shift @ARGV;  # (extra step: removes the file name that we just handle
+d)

# for each remaining file
#    open the file
#    while reading each line from the file
#        get the first column of the line for use as a hash key
#        append the line to the current value of the hash element usin
+g that key

for my $file ( @ARGV ) {
    open( IN, '<', $file ) or die "Can't read $file: $!\n";
    while (<IN>) {
        my ( $key ) = ( /^(\S+)/ );

# extra step: turn EOL whitespace into a tab character
        s/\s+$/\t/;

        $output{$key} .= $_;
    }
}

# for each key in the hash -- using the original ordering
#     print the value of the hash element using that key

for my $key ( @output_order ) {

#  extra step: convert the final tab character into a line-feed
    s/\t$/\n/;

    print $output{$key};
}
[download]

Now, if there's anything there you don't understand, you'll need to do some reading, check some tutorials, search through some perl documentation, etc. That way, you're more likely to be able to write a script on your own the next time a task like this comes up. (And you're more likely to be able to fix this one, if/when things don't go the way you expect -- e.g. if some files have different keys than other files, etc).

Comment on Re: join mutiple files on first column Select or Download Code

Replies are listed 'Best First'.
Re^2: join mutiple files on first column by david_lyon (Sexton) on Apr 08, 2011 at 00:36 UTC
Thanks so much for your very detailed help, very much grateful. I shall be making use of it from now onwards Thanks again graff!	[reply]