ikhan has asked for the wisdom of the Perl Monks concerning the following question:

PerlMonks: Thanks for all the help.

I have another question. I am trying to merge a given column from individual lfiles to a single text file. This is how the files look like:

input1

freq   data1

1     2

2     5

3     10

input2

freq   data2

1     12

2     50

3     10

input3

freq     data3

1     10

2     25

3     11

Combined output

freq   data1   data2   data3

1   2   12   10

2   5   50   25

3   10   10   11

I have difficulty writing to a certain column in output

following code attempts to take second column from input file and writes to second column in output file, but not successful.

use strict; use warnings; open(my $file2, '<', "data1.txt") or die "failed to open 'output_1.txt + $!"; open(my $file3, '>>', "combined.txt") or die "failed to open 'combined +.txt $!"; while (<$file2>) { chomp; # delete new line character my @combined_data = $file3; my @list_file = split; print "$combined_data[1]" "$list_file[1]\n"; } close $file2; close $file3;

Replies are listed 'Best First'.
Re: Merging Columns from multiple files into a single file
by davido (Cardinal) on Oct 18, 2015 at 19:31 UTC

    I would do it like this:

    use strict; use warnings; my %freq; foreach my $file (@files) { open my $infh, '<', $file or die $!; while (<$infh>) { chomp; my ($freq, $data) = split /\s+/, $_; push @{$freq{$freq}}, $data; } } foreach my $k (sort {$a <=> $b} keys %freq) { print "$k @{$freq{$k}}\n"; }

    But this makes the assumption that there is a manageable quantity of frequencies. If the list of frequencies and values is too large to hold in memory you might just want to consider a database solution so that you could tailor your selects with appropriate detail and constraint.


    Dave

Re: Merging Columns from multiple files into a single file
by AppleFritter (Vicar) on Oct 18, 2015 at 19:20 UTC

    You cannot easily write to columns in a file; text file access is inherently row-based. I reckon there may well be CPAN modules to access columns in files, but if you seek to combine data from several files it's a much better idea to read all the input files and then write to the output file once, e.g.:

    #!/usr/bin/perl use Modern::Perl '2014'; # generate some filenames my @inputfiles = map { "data$_.txt" } 1 .. 3; my $inputdata = {}; # read files into $inputdata foreach my $filenumber (0 .. $#inputfiles) { open my $HANDLE, "<", $inputfiles[$filenumber] or die "Cannot open $inputfiles[$filenumber]: $!\n"; while(<$HANDLE>) { chomp; my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; } close $HANDLE or warn "Cannot close $inputfiles[$filenumber]: $!\n"; } # write combined output open my $OUTPUT, ">", "combined.txt" or die "Cannot open combined.txt: $!\n"; foreach my $freq (sort keys %{ $inputdata }) { say $OUTPUT "$freq ", join " ", @{ $inputdata->{$freq} }; } close $OUTPUT or warn "Cannot close combined.txt: $!\n";

      Hi AppleFritter:

      Thanks for taking time. There is little hickup in the code. In the combined output file, we only see the last

      line form each input file. I tried adding new line character in the "say" statement but not successful.

      can you please take a look. Appreciate your time

        It looks like your data has a space at the start of each line. If so, try

        while(<$HANDLE>) { chomp; s/^\s+//; # remove leading spaces my ($freq, $data) = split /\s+/, $_, 2; $inputdata->{$freq}->[$filenumber] = $data; }
        poj

        As my bro-tastic monastic Brother poj already pointed out, it may be that your data has spaces at the beginning of each line.

        In general, the way you read data from your input files will depend on their exact format, its constraints, and the assumptions you are allowed to make. (Thank you, Dame Captain Obvious.) The code I posted was fairly simple and assumed a fairly rigid structure: one row of data per line, exactly one set of data per file, and each line conforming to the following structure:

        freq marker (containing no whitespace); any amount of whitespace; data (possibly including whitespace)

        Note that the split call splits on whitespace (\s+) and limits itself to two (2) fields, so if the file indeed conforms to this structure, you'll get your freq marker and data just as expected. However, if a line starts with whitespace, split will see and split on that instead, and return an empty string (then assigned to $freq) followed by the entire rest of line after said leading whitespace (then assigned to $data).

        So what should you do, then? It depends. If extra whitespace is the worst that can happen to you, then use poj's solution to remove leading spaces on each line. Otherwise, you'll have to think about what sort of file structure you can expect, and modify your script accordingly to deal with all possible corner cases.

        As an aside: do you have control over where and how these data files are generated? If so, it may be worth modifying the producing script instead (or as well); it's often easier to not output data in a certain way to begin with than to try to parse it back it back when you can rely on fewer assumptions. (But also remember the Robustness principle: be conservative in what you output, and liberal in what you accept. This is true even for data you generate and consume entirely by yourself.)

        Also, avoid reinventing the wheel (unless it's necessary and/or fun, of course). Instead of relying on ad-hoc formats, you may be better off utilizing a standard format such as CSV for your data files, using e.g. Text::CSV to do all the heavy lifting for you (input and output).

Re: Merging Columns from multiple files into a single file
by kevbot (Vicar) on Oct 18, 2015 at 21:51 UTC
    Here is a solution using Data::Table. However, I was a bit surprised to discover that the helper functions (such as fromFile, fromCSV, etc.) of Data::Table do not seem to support using regular expressions as a delimiter. For the data you posted, I would have used \s+ as the delimiter. By the time I discovered this, I had already written this solution...so I thought I'd post it. A solution like this will work if your input files are tab delimited (or one could use fromCSV for files with other delimiters).
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; my @files = qw( input1.txt input2.txt input3.txt ); my @tables; foreach my $file (@files) { my $dt = Data::Table::fromTSV($file, 1); push @tables, $dt; } my $merged_table; TABLE: foreach my $i (0..$#tables){ if ($i == 0){ $merged_table = $tables[0]; next TABLE; } $merged_table = $merged_table->join($tables[$i], Data::Table:: +INNER_JOIN, ['freq'], ['freq']); } $merged_table->sort('freq', 1, 0); open my $output_fh, ">", "combined.txt" or die "Cannot open combined.t +xt: $!\n"; print {$output_fh} $merged_table->tsv; $output_fh->close; exit;