join mutiple files on first column

david_lyon has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: join mutiple files on first column by moritz (Cardinal) on Mar 27, 2011 at 14:54 UTC
It shouldn't be too hard to do, perlintro teaches you everything you need to know to accomplish it. We are here to help you to learn perl, not to write the scripts you want. If you show some effort, we will certainly help you. Perl 6 - second systems done right	[reply]
Re: join mutiple files on first column by ww (Archbishop) on Mar 27, 2011 at 14:56 UTC
Did you read open, split, join, map, or make even a cursory effort to search answers already available here? I ask, because we prefer to teach, rather than do-for, those who bow "and send Praises." We also take the soft soap more seriously if you trouble yourself to read On asking for help How do I post a question effectively? Markup in the Monastery I know what I mean. Why don't you? Update: upon rereading this screed, I see I've failed to be specific about the reference to "I know what I mean;..." : Given your OP, we have no idea of the format of the "text" files in particular WRT the "first column" - are you talking about a simple text file, a CSV or Tab Delimited File or something else.	[reply]
Re: join mutiple files on first column by graff (Chancellor) on Mar 27, 2011 at 14:59 UTC
Have you tried writing any perl code yourself for this? Can you work out the code that would perform something like this step-wise method: get the list of file names declare a hash for output open the first file while reading each line from the file get the first column of the line for use as a hash key assign the line as the value of hash element using that key for each remaining file open the file while reading each line from the file get the first column of the line for use as a hash key append the line to the current value of the hash element using + that key for each key in the hash print the value of the hash element using that key [download] See what you can do. If you have trouble, show us the code you tried, and we'll be able to help out with the details. (I'm hoping I understood your task correctly -- if I didn't, please clarify.) UPDATE: Actually, the method can be stated even more simply than what I showed above. In effect, you can leave out the part that handles the first file separately, and just do the "for" loop over all files in the list -- i.e. just treat all files in the list the same way.	[reply] [d/l]
Re: join mutiple files on first column by TomDLux (Vicar) on Mar 27, 2011 at 17:30 UTC
You have 75% of a shell solution there. All you need to complete a shell solution is to append the grepped values onto a destination file. On the other hand, if you want a Perl solution, go for a Perl solution. Shelling out constantly is going to be relatively expensive, so instead read the files, once. As you read each file, split it into fields and store a single row as an array. Store that array in a hash keyed by the primary field, creating the entry if it doesn't exist, appending the field values onto the existing array if it does exist. As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply]
Re: join mutiple files on first column by planetscape (Chancellor) on Mar 27, 2011 at 19:59 UTC
See join - join two files according to a common key. HTH, planetscape	[reply]
Re: join mutiple files on first column by graff (Chancellor) on Mar 30, 2011 at 22:42 UTC
So, now that the OP has been updated with some code (and PLEASE UPDATE IT AGAIN, to fix the formatting -- <p>...</p> around descriptions and questions, <c>...</c> around code and data)... I get the impression that you aren't really interested in learning to program in Perl -- so go ahead and just use R, no problem there. If you actually do want to learn perl, you might try the algorithm I proposed in my earlier reply, instead of copying code that someone else wrote and that you don't understand. It's not hard to write a working perl script based on a decent pseudo-code description -- you just have to settle a few details, like "where does the list of file names come from?" Here's an example that assumes the list of file names comes from command line args (which end up in @ARGV) -- that is, you would invoke the script like this: `name_of_script .txt > all-txt-files.joined` [download] That assumes that your 30 text files are all in the current working directory, and you are able to create a new file in that directory. The following example adds a few extra steps that weren't covered in my earlier post: #!/usr/bin/perl use strict; use warnings; # get the list of file names -- actually, just make sure @ARGV has the +m: # declare a hash for output die "Usage: $0 .txt > text.joined\n" unless ( @ARGV and -f $ARGV[0] ) +; my %output; # extra step: declare an array to preserve original order of keys in f +irst file my @output_order; # open the first file # while reading each line from the file # get the first column of the line for use as a hash key # assign the line as the value of hash element using that key open( IN, '<', $ARGV[0] ) or die "Can't read $ARGV[0]: $!\n"; while (<IN>) { my ( $key ) = ( /^(\S+)/ ); # extra steps: add key to order array, turn EOL whitespace into tab c +haracter push @output_order, $key; s/\s+$/\t/; $output{$key} = $_; } shift @ARGV; # (extra step: removes the file name that we just handle +d) # for each remaining file # open the file # while reading each line from the file # get the first column of the line for use as a hash key # append the line to the current value of the hash element usin +g that key for my $file ( @ARGV ) { open( IN, '<', $file ) or die "Can't read $file: $!\n"; while (<IN>) { my ( $key ) = ( /^(\S+)/ ); # extra step: turn EOL whitespace into a tab character s/\s+$/\t/; $output{$key} .= $_; } } # for each key in the hash -- using the original ordering # print the value of the hash element using that key for my $key ( @output_order ) { # extra step: convert the final tab character into a line-feed s/\t$/\n/; print $output{$key}; } [download] Now, if there's anything there you don't understand, you'll need to do some reading, check some tutorials, search through some perl documentation, etc. That way, you're more likely to be able to write a script on your own the next time a task like this comes up. (And you're more likely to be able to fix this one, if/when things don't go the way you expect -- e.g. if some files have different keys than other files, etc).	[reply] [d/l] [select]
Re^2: join mutiple files on first column by david_lyon (Sexton) on Apr 08, 2011 at 00:36 UTC
Thanks so much for your very detailed help, very much grateful. I shall be making use of it from now onwards Thanks again graff!	[reply]