Quinnz has asked for the wisdom of the Perl Monks concerning the following question:

Hello, everyone,
I am a very beginner of Perl but I knew Perl will easily
solve my problem. Basically, I'd like to merge two ACSII
text data into one by time-stamp.

file1.txt (second column is time-stamp):
Feb-21,19:08:05.2,$GP,48.96,90.92,45.69
Feb-21,19:08:06.4,$GP,48.92,90.92,45.70
Feb-21,19:08:07.6,$GP,48.93,90.99,45.66
Feb-21,19:08:08.1,$GP,48.92,90.95,45.66
Feb-21,19:08:09.0,$GP,48.85,90.92,45.62
Feb-21,19:08:10.8,$GP,48.92,90.94,45.63

file2.txt(second column is time-stamp but not
corresponding to the one of file1.txt):
Feb-21,19:08:05.2,$BM,3.89
Feb-21,19:21:20.5,$BM,5.05
Feb-21,19:08:07.6,$BM,6.20
Feb-21,19:21:20.8,$BM,7.36
Feb-21,19:08:09.0,$BM,8.52
Feb-21,19:08:10.8,$BM,9.68

Output.txt will look like this:
Feb-21,19:08:05.2,$GP,48.96,90.92,45.69,19:08:05.2,$BM,3.89
Feb-21,19:08:06.4,$GP,48.92,90.92,45.70,
Feb-21,19:08:07.6,$GP,48.93,90.99,45.66,19:08:07.6,$BM,6.20
Feb-21,19:08:08.1,$GP,48.92,90.95,45.66,
Feb-21,19:08:09.0,$GP,48.85,90.92,45.62,19:08:09.0,$BM,8.52
Feb-21,19:08:10.8,$GP,48.92,90.94,45.63,19:08:10.8,$BM,9.68

Both data are of huge table file with comma delimitation.
Thank you very much for your help.

Quinn

Replies are listed 'Best First'.
Re: Merge text data by time-stamp
by bart (Canon) on Feb 24, 2010 at 22:46 UTC

    So here's what I think you could best do:

    Read both files line by line, and for each line:

    1. split into date+timestamp on one hand, and data on the other:
      my($datetime, $data) = /^(\w+-\d+,[\d:.]+),(.*)/;
    2. push the $data onto an arrayref that's the value in global hash for that $datetime:
      push @{ $data{$datetime} }, $data;

      Note that autovivification makes it unnecessary to create the arrayref before you do this.

    OK, so you now have the data in the hash in a form directly usable for your output. One major problem is that hash keys are unordered, so you'll have to sort them. If all dates are the same, a simple alphanumerical sort will sort them by timestamp:

    foreach my $key (sort keys %data) { ...

    You can simply join the values in the anonymous array, with a comma:

    print $key, ',', join(',', @{ $data{$key} }), "\n";

    And that should be close!

    The whole program:

    #! perl -w @ARGV = ('file1.txt', 'file2.txt'); my %data; while(<>) { my($datetime, $data) = /^(\w+-\d+,[\d:.]+),(.*)/; push @{ $data{$datetime} }, $data; } foreach my $key (sort keys %data) { print $key, ',', join(',', @{ $data{$key} }), "\n"; }
    With your sample data, this produces:
    Feb-21,19:08:05.2,$GP,48.96,90.92,45.69,$BM,3.89 Feb-21,19:08:06.4,$GP,48.92,90.92,45.70 Feb-21,19:08:07.6,$GP,48.93,90.99,45.66,$BM,6.20 Feb-21,19:08:08.1,$GP,48.92,90.95,45.66 Feb-21,19:08:09.0,$GP,48.85,90.92,45.62,$BM,8.52 Feb-21,19:08:10.8,$GP,48.92,90.94,45.63,$BM,9.68 Feb-21,19:21:20.5,$BM,5.05 Feb-21,19:21:20.8,$BM,7.36
      Thank you very much for your time. It helps me a lot.