Merging Columns from multiple files into a single file

ikhan has asked for the wisdom of the Perl Monks concerning the following question:

PerlMonks: Thanks for all the help.

I have another question. I am trying to merge a given column from individual lfiles to a single text file. This is how the files look like:

input1

freq data1

1 2

2 5

3 10

input2

freq data2

1 12

2 50

3 10

input3

freq data3

1 10

2 25

3 11

Combined output

freq data1 data2 data3

1 2 12 10

2 5 50 25

3 10 10 11

I have difficulty writing to a certain column in output

following code attempts to take second column from input file and writes to second column in output file, but not successful.

use strict;
use warnings;

open(my $file2, '<', "data1.txt") or die "failed to open 'output_1.txt
+ $!";         
open(my $file3, '>>', "combined.txt") or die "failed to open 'combined
+.txt $!";  
while (<$file2>)
{
    chomp;   # delete new line character
    my @combined_data = $file3;
    my @list_file = split;
    print "$combined_data[1]"  "$list_file[1]\n";
}
close $file2;
close $file3;
[download]

Comment on Merging Columns from multiple files into a single file Download Code

Replies are listed 'Best First'.
Re: Merging Columns from multiple files into a single file by davido (Cardinal) on Oct 18, 2015 at 19:31 UTC
I would do it like this: `use strict; use warnings; my %freq; foreach my $file (@files) { open my $infh, '<', $file or die $!; while (<$infh>) { chomp; my ($freq, $data) = split /\s+/, $_; push @{$freq{$freq}}, $data; } } foreach my $k (sort {$a <=> $b} keys %freq) { print "$k @{$freq{$k}}\n"; }` [download] But this makes the assumption that there is a manageable quantity of frequencies. If the list of frequencies and values is too large to hold in memory you might just want to consider a database solution so that you could tailor your selects with appropriate detail and constraint. Dave	[reply] [d/l]
Re: Merging Columns from multiple files into a single file by AppleFritter (Vicar) on Oct 18, 2015 at 19:20 UTC
You cannot easily write to columns in a file; text file access is inherently row-based. I reckon there may well be CPAN modules to access columns in files, but if you seek to combine data from several files it's a much better idea to read all the input files and then write to the output file once, e.g.: #!/usr/bin/perl use Modern::Perl '2014'; # generate some filenames my @inputfiles = map { "data$_.txt" } 1 .. 3; my $inputdata = {}; # read files into $inputdata foreach my $filenumber (0 .. $#inputfiles) { open my $HANDLE, "<", $inputfiles[$filenumber] or die "Cannot open $inputfiles[$filenumber]: $!\n"; while(<$HANDLE>) { chomp; my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; } close $HANDLE or warn "Cannot close $inputfiles[$filenumber]: $!\n"; } # write combined output open my $OUTPUT, ">", "combined.txt" or die "Cannot open combined.txt: $!\n"; foreach my $freq (sort keys %{ $inputdata }) { say $OUTPUT "$freq ", join " ", @{ $inputdata->{$freq} }; } close $OUTPUT or warn "Cannot close combined.txt: $!\n"; [download]	[reply] [d/l]
Re^2: Merging Columns from multiple files into a single file by ikhan (Initiate) on Oct 20, 2015 at 05:18 UTC
Hi AppleFritter: Thanks for taking time. There is little hickup in the code. In the combined output file, we only see the last line form each input file. I tried adding new line character in the "say" statement but not successful. can you please take a look. Appreciate your time	[reply]
Re^3: Merging Columns from multiple files into a single file by poj (Abbot) on Oct 20, 2015 at 06:33 UTC
It looks like your data has a space at the start of each line. If so, try `while(<$HANDLE>) { chomp; s/^\s+//; # remove leading spaces my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; }` [download] poj	[reply] [d/l]
Re^3: Merging Columns from multiple files into a single file by AppleFritter (Vicar) on Oct 20, 2015 at 10:09 UTC
As my bro-tastic monastic Brother poj already pointed out, it may be that your data has spaces at the beginning of each line. In general, the way you read data from your input files will depend on their exact format, its constraints, and the assumptions you are allowed to make. (Thank you, Dame Captain Obvious.) The code I posted was fairly simple and assumed a fairly rigid structure: one row of data per line, exactly one set of data per file, and each line conforming to the following structure: freq marker (containing no whitespace); any amount of whitespace; data (possibly including whitespace) Note that the `split` call splits on whitespace (`\s+`) and limits itself to two (`2`) fields, so if the file indeed conforms to this structure, you'll get your `freq` marker and data just as expected. However, if a line starts with whitespace, `split` will see and split on that instead, and return an empty string (then assigned to `$freq`) followed by the entire rest of line after said leading whitespace (then assigned to `$data`). So what should you do, then? It depends. If extra whitespace is the worst that can happen to you, then use poj's solution to remove leading spaces on each line. Otherwise, you'll have to think about what sort of file structure you can expect, and modify your script accordingly to deal with all possible corner cases. As an aside: do you have control over where and how these data files are generated? If so, it may be worth modifying the producing script instead (or as well); it's often easier to not output data in a certain way to begin with than to try to parse it back it back when you can rely on fewer assumptions. (But also remember the Robustness principle: be conservative in what you output, and liberal in what you accept. This is true even for data you generate and consume entirely by yourself.) Also, avoid reinventing the wheel (unless it's necessary and/or fun, of course). Instead of relying on ad-hoc formats, you may be better off utilizing a standard format such as CSV for your data files, using e.g. Text::CSV to do all the heavy lifting for you (input and output).	[reply]
Re: Merging Columns from multiple files into a single file by kevbot (Vicar) on Oct 18, 2015 at 21:51 UTC
Here is a solution using Data::Table. However, I was a bit surprised to discover that the helper functions (such as fromFile, fromCSV, etc.) of Data::Table do not seem to support using regular expressions as a delimiter. For the data you posted, I would have used \s+ as the delimiter. By the time I discovered this, I had already written this solution...so I thought I'd post it. A solution like this will work if your input files are tab delimited (or one could use fromCSV for files with other delimiters). #!/usr/bin/env perl use strict; use warnings; use Data::Table; my @files = qw( input1.txt input2.txt input3.txt ); my @tables; foreach my $file (@files) { my $dt = Data::Table::fromTSV($file, 1); push @tables, $dt; } my $merged_table; TABLE: foreach my $i (0..$#tables){ if ($i == 0){ $merged_table = $tables[0]; next TABLE; } $merged_table = $merged_table->join($tables[$i], Data::Table:: +INNER_JOIN, ['freq'], ['freq']); } $merged_table->sort('freq', 1, 0); open my $output_fh, ">", "combined.txt" or die "Cannot open combined.t +xt: $!\n"; print {$output_fh} $merged_table->tsv; $output_fh->close; exit; [download]	[reply] [d/l]