steph_bow has asked for the wisdom of the Perl Monks concerning the following question:

Dear wise Monks

I have made a code to concatene lines from two different files but my code fails to give the expected results.

Could you check ?

I would like to concatene the lines which have the same name of aircraft (second column for the first file, first column for the second file)

In my example, it would be DAL11 which is common to both files

But it seems that chomp does not work and the data are shifted in the wwrong column

The part that is the core of the code is

print OUTFILE "$line_1"; my $meter = $length_1; if (defined $aircraft_id_1){ ################### OPEN THE SECOND INFILE ################### +######### open(INFILE_2,"<${file_2}") or die "Can't open ${file_2} : $!" +; ############################################################## +### while (my $line_2 = <INFILE_2>){ chomp($line_2); my @Elements_2 = split(/;/, $line_2); # warning : in this file, the aircraft_id looked for is in + the first column my $aircraft_id_2 = $Elements_2[0]; # print STDOUT "the aircraft_id in the resemblance_criteri +on_file is : $aircraft_id_2\n"; if ($aircraft_id_1 eq $aircraft_id_2){ print OUTFILE ";$line_2\n"; } } close INFILE_2; }
Here is my whole code, so you can download it ;

#!/usr/bin/perl use strict; use warnings; use diagnostics; use Cwd; my $Current_Dir = getcwd; print STDOUT "the current directory is $Current_Dir\n"; my $file_1 = "$ARGV[0]"; my $file_2 = "$ARGV[1]"; ################### OPEN THE FIRST INFILE ############################ open(INFILE_1,"<$file_1") or die "Can't open $file_1 : $!"; ################################################################# ################### OPEN THE OUTFILE ############################ my $outfile = "outfile_$file_1"; open(OUTFILE,">$outfile") or die "Can't open $outfile : $!"; ################################################################## # print the title of the columns my $titles_line = <INFILE_1>; print OUTFILE "$titles_line"; while (my $line_1 = <INFILE_1>){ chomp($line_1); my @Elements_1 = split(/;/, $line_1); my $aircraft_id_1 = $Elements_1[1]; # print STDOUT "the aircraft_id in the Analysis_slot_list is : $ai +rcraft_id_1\n"; # calculation of the length of $line_1 my $length_1 = @Elements_1; print STDOUT "the length is $length_1\n"; print STDOUT "The Table is @Elements_1\n"; print OUTFILE "$line_1"; my $meter = $length_1; if (defined $aircraft_id_1){ ################### OPEN THE SECOND INFILE ################### +######### open(INFILE_2,"<${file_2}") or die "Can't open ${file_2} : $!" +; ############################################################## +### while (my $line_2 = <INFILE_2>){ chomp($line_2); my @Elements_2 = split(/;/, $line_2); # warning : in this file, the aircraft_id looked for is in + the first column my $aircraft_id_2 = $Elements_2[0]; # print STDOUT "the aircraft_id in the resemblance_criteri +on_file is : $aircraft_id_2\n"; if ($aircraft_id_1 eq $aircraft_id_2){ print OUTFILE ";$line_2\n"; } } close INFILE_2; } else{ while ($meter<40){ print OUTFILE ";"; ++ $meter; } } } close INFILE_1; close OUTFILE;

Here are my two start files (actually parts of it)

First file

Slot_time;Aircraft_Id;EOBT;ETOT;CTOT;ATOT;ETO;CTO;ATO;Last_DLA;EOBT_DL +A;Anticip_DLA_min;Flag_EOBT_DLA;Last_FPL;EOBT_FPL;Anticip_FPL_min;Fla +g_EOBT_FPL;ETOT_First;Delta_ETOTs_min;ATFM_Delay;;;;;;;; 08:40:00;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:40:58;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:41:56;BMA2CW;08:00;08:20;08:33;08:33;08:31;;08:41;07:52:00;08:00;-8 +;;;;;;;;;;;;;;;; 08:42:55;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:43:53;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:44:51;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:45:50;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:46:48;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:47:46;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:48:45;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:49:43;DAL11;08:10;08:30;08:30;08:39;08:47;08:47;08:50;;;;;05:03:00; +08:10;-187;;08:30;0;0;;;;;;;; 08:50:41;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:51:40;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:52:38;;;;;;;;;;;;;;;;;;;;;;;;;;; 08:53:36;ACA879;07:30;07:42;07:42;07:52;08:51;08:51;08:53;;;;;05:06:00 +;07:30;-144;;07:42;0;0;;;;;;;;

Here is the second file

COA45;COA44;COA;B762;KEWR;LIMC;22:05;1325;05:20;320;;; COA57;COA56;COA;B772;KEWR;LFPG;22:30;1350;04:37;277;;; COA67;COA66;COA;B752;KCLE;EGKK;23:35;1415;06:08;368 COA79;COA78;COA;B762;KEWR;LSZH;23:05;1385;05:48;348 DAL11;DAL12;DAL;B772;KATL;EGKK;22:05;1325;05:39;339 DAL117;DAL116;DAL;B763;KATL;EDDS;22:15;1335;06:59;419 DAL119;DAL118;DAL;B763;KJFK;LFPG;23:05;1385;05:31;331 DAL125;DAL124;DAL;B763;KATL;EBBR;22:05;1325;06:14;374 DAL133;DAL132;DAL;B763;KJFK;LGAV;21:35;1295;06:28;388 DAL141;DAL140;DAL;B763;KJFK;EBBR;23:50;1430;06:16;376 DAL149;DAL148;DAL;B763;KJFK;LIRF;21:35;1295;05:05;305 DAL149;DAL150;DAL;B763;KJFK;LIPZ;22:35;1355;05:58;358 DAL151;DAL150;DAL;B763;KJFK;LIPZ;22:35;1355;05:58;358

Replies are listed 'Best First'.
Re: concatenation of lines from two different files
by dogz007 (Scribe) on Aug 17, 2007 at 14:17 UTC
    I believe the following replacement should fix your problem. Note how I shift out the aircraft ID from @Elements_2 and then join the remainder of the elements together to append to the line in question. You were also missing a few line breaks. Let us know if this fixes it.

    if (defined $aircraft_id_1){ open(INFILE_2,"<${file_2}") or die "Can't open ${file_2} : $!"; while (my $line_2 = <INFILE_2>){ chomp($line_2); my @Elements_2 = split(/;/, $line_2); my $aircraft_id_2 = shift @Elements_2; if ($aircraft_id_1 eq $aircraft_id_2){ print OUTFILE join(';',@Elements_2), "\n"; } } close INFILE_2; } else { print OUTFILE (";" x (40-$meter)), "\n" }
Re: concatenation of lines from two different files
by agianni (Hermit) on Aug 17, 2007 at 14:26 UTC
    You are reading through the second file as many times as there are lines in the first file. That's way inefficent. Instead, consider reading through the first file and putting its contents into a hash keyed off of the pertinent value. Something like (untested):
    my %file1_row_for; while (my $input = <THAT_FILE>){ # only grab column 2 my ( undef, $id ) = split /;/, $input; my $file1_row_for{$id} = $input; }
    Do something like that for the first file, then you can do something like this:
    while ( my $input = <THAT_OTHER_FILE> ){ my ( $id2 ) = split /;/, $input; print OUTFILE $input print OUTFILE ';' . $file1_row_for{$id2} if defined $file1_row_for +{$id2} print OUTFILE "\n"; }
    To generate your output. One other question, though: is there a possibility that there will be more than one matching row in the second file? Even if you think there isn't, you should probably write your code to either process that in a particular manner (output two lines?) or to die if it runs into that scenario.
    perl -e 'split//,q{john hurl, pest caretaker}and(map{print @_[$_]}(joi +n(q{},map{sprintf(qq{%010u},$_)}(2**2*307*4993,5*101*641*5261,7*59*79 +*36997,13*17*71*45131,3**2*67*89*167*181))=~/\d{2}/g));'

      Dear agianni,

      Thanks a lot for your reply

      Concerning your latest remark, in cases of several matches, I intend to write the other matches next to the first match, on the same line

        If that's the case, you'll need to make the hash storing the contents of the first file a hash of arrays (or, rather, a hash of arrayrefs) rather than a simple hash, so you can store each line that matches the id:

        my %file1_row_for; while (my $input = <THAT_FILE>){ # only grab column 2 my ( undef, $id ) = split /;/, $input; # add this line to the array of data for this key push @{ $file1_row_for{$id} } = $input; }

        Then when you want to output it at the end of the line you can:

        print join ';', @{ $file_row_for{$id} } if scalar @{ $file_row_for{$id +} };

        Note that you'll just want to check for the length of the arrayref rather than definedness as in my original code.

        perl -e 'split//,q{john hurl, pest caretaker}and(map{print @_[$_]}(joi +n(q{},map{sprintf(qq{%010u},$_)}(2**2*307*4993,5*101*641*5261,7*59*79 +*36997,13*17*71*45131,3**2*67*89*167*181))=~/\d{2}/g));'
Re: concatenation of lines from two different files
by dogz007 (Scribe) on Aug 17, 2007 at 14:30 UTC
    I also noticed that you are reading file2 every time you need to lookup an aircraft ID. This could take a lot of time if your files are very large. Instead, read all of the data from file2 into a hash, and lookup the aircraft ID's as keys in that hash. Then you will have read each file only one time through. The code below is an example of how you can do this.

      Thanks a lot dogz007

      I have understood what you explained and I think your code is very good Thanks

      However, I have a problem

      the result of the code with the files posted in the beginning is this

      Slot_time;Aircraft_Id;EOBT;ETOT;CTOT;ATOT;ETO;CTO;ATO;Last_DLA;EOBT_DL +A;Anticip_DLA_min;Flag_EOBT_DLA;Last_FPL;EOBT_FPL;Anticip_FPL_min;Fla +g_EOBT_FPL;ETOT_First;Delta_ETOTs_min;ATFM_Delay;;;;;;;; 08:40:00;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:40:58;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:41:56;BMA2CW;08:00;08:20;08:33;08:33;08:31;;08:41;07:52:00;08:00;-8 +;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:42:55;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:43:53;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:44:51;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:45:50;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:46:48;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:47:46;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:48:45;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:49:43;DAL11;08:10;08:30;08:30;08:39;08:47;08:47;08:50;;;;;05:03:00; +08:10;-187;;08:30;0;0;;;;;;;; DAL12;DAL;B772;KATL;EGKK;22:05;1325;05:39;339 08:50:41;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:51:40;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:52:38;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;; 08:53:36;ACA879;07:30;07:42;07:42;07:52;08:51;08:51;08:53;;;;;05:06:00 +;07:30;-144;;07:42;0;0;;;;;;;;;;;;;;;;;;;;;;;;;;;;

      I cannot understand why the data added are not on the same line as the data of the first file

      DAL12 should have been on the same line as DAL11 and I cannot understand because there is chomp in the code

      Well, I am going to make a post on this particular point

      Thanks a lot for your explanations

Re: concatenation of lines from two different files
by graff (Chancellor) on Aug 18, 2007 at 04:41 UTC
    (shameless plug for a tool that I wrote :) This sort of "flat file join" is the kind of thing I've had to do quite often over the years -- not just the intersection on a key field, like you're doing, but also unions and diffs, with flexible field delimiters (regex) and flexible output.

    So I wrote my own generalized command-line tool to do this, and have been adding features every now and then. It's posted here: cmpcol (and I've just updated that node to include stuff I've added to the script since the original posting).

    Given your two data files (call them t1 and t2), I can generate the output you specified with this command line:

    cmpcol -i -lb -d \; t1:2 t2
    Where:
    • "-i" means "output the intersection"
    • "-lb" means "list full contents of matching lines from both files" (concatenate with no separator string)
    • "-d \;" means the input field delimiter is semi-colon (have to use "\;" so the shell won't interpret ";" as a command separator)
    • "t1:2" means "match on the second field of file t1"
    • "t2" implies "match on the first field of file t2"
Re: concatenation of lines from two different files
by Corion (Patriarch) on Aug 17, 2007 at 14:37 UTC

    I haven't looked through your code so I don't know if it completely solves your problem, especially with the requirements of adding airplanes with the same id onto the same line. join - join two files according to a common key is based on the idea of loading one file completely into memory and using a hash to find the corresponding airplane id.

Re: concatenation of lines from two different files
by starX (Chaplain) on Aug 17, 2007 at 13:56 UTC
    Could you please give an example of what you expect to see as output?

      Thanks starX for your quick reply

      My output would be:

      If the aircraft_identity of the first file (2nd column )correspond to an aircraft_identity of the second file (first column), I would like to have this :

      08:49:43;DAL11;08:10;08:30;08:30;08:39;08:47;08:47;08:50;;;;;05:03:00; +08:10;-187;;08:30;0;0;;;;;;;;DAL11;DAL12;DAL;B772;KATL;EGKK;22:05;132 +5;05:39;339

      But if there's no correspondance, then write only the lines of the first file

        I've toyed around with this a little bit and propose using the following format (starting around line 66 in your code)
        if ($aircraft_id_1 eq $aircraft_id_2){ #print "$aircraft_id_1 eq $aircraft_id_2\n"; print OUTFILE "$line_1;$line_2\n"; } else { print OUTFILE "$line_1\n"; }
        and removing the earlier (line 45ish) print statement that only prints the line from the first file. This way you're figuring out all in one place whether or not you are printing just one line or several.

        As best as I can tell your matching logic is fine. Or am I missing something?