How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perlmonks,

I need to extract the data present in input files which has unique keys in terms of Iterations, and then I have to generate the output merged data as given below. Few times, I even have to merge more files (i.e. File1, File2, File3.txt etc) Please help me in resolving this issue.

Input files:

File1.txt:
==========
Iteration 1:
0: Data: 1023 1023
1: Data: 200 1023
2: Data: 300 1023
3: Data: 250 1023
4: Data: 1023 1023

Iteration 2:
0: Data: 1023 1024
1: Data: 1023 60
2: Data: 1023 90
3: Data: 1023 100
4: Data: 1023 1023

File2.txt:
==========
Iteration 1:
0: Data: 1023 1023
1: Data: 250 1023
2: Data: 60 1023
3: Data: 99 1023
4: Data: 1023 1023

Iteration 2:
0: Data: 1023 1024
1: Data: 1023 60
2: Data: 1023 90
3: Data: 1023 100
4: Data: 1023 1023

Output:

Expected Output file with File1 and File2 is:
Output.txt
=============
Iteration1:
File1:0: Data: 1023 1023 File2:0:Data:1023 1023
File1:1: Data: 200 1023 File2:1:Data:250 1023
File1:2: Data: 300 1023 File2:2:Data:60 1023
File1:3: Data: 250 1023 File2:3:Data:99 1023
File1:4: Data: 1023 1023 File2:4:Data:1023 1023

Iteration2:
0:Data: 1023 1024 File2:0: Data: 1023 1024
1:Data: 1023 60 File2:1: Data: 1023 60
2:Data: 1023 90 File2:2: Data: 1023 90
3:Data: 1023 100 File2:3: Data: 1023 100
4:Data: 1023 1023 File2:4: Data: 1023 1023

Comment on How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files

Replies are listed 'Best First'.
Re: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by bart (Canon) on Sep 02, 2009 at 12:47 UTC
It appears to me that your datafiles have a hierarchical structure. So you'll have to parse them, and I think, for a generic parsing routine (now you only have to merge the files, but I'm sure that later on you'll have to write a similar script to do other stuff to the same files), it makes the most sense to parse the data into a data structure (a tree); see perldsc and perllol to see what kind of stuff I'm talking about. Load all datafiles in a proper manner into the same data structure. As a second step, you'll have to produce the desired output from the produced tree. Step 1: load the data from all files into a data structure: `my @files = qw(File1.txt File2.txt); my %tree; my(@iterations, @names); # order foreach my $file (qw(File1.txt File2.txt)) { open my $fh, '<', $file or die "Cannot open file $file: $!"; (my $name = $file) =~ s/\.\w+$//; # remove extension push @refs, $ref; # order my($iteration); while(<$fh>) { chomp; if(/^Iteration /) { $iteration = $_; push @iterations, $iteration unless $tree{$iteration}; # +key order } elsif(my($i) = /^(\d+):/) { $tree{$iteration}[$i]{$name} = $_; } } }` [download] You can show the contents of the data structure, to see if it works: `use Data::Dumper; print Dumper \%tree;` [download] Step 1 is done. Now step 2: print out the data to a file. `foreach my $iteration (@iterations) { print "$iteration\n"; # section header my $section = $tree{$iteration}; # array ref for my $i (0 .. $#$section) { my @data; foreach my $name (@names) { my $data = $section->[$i]{$name}; next unless defined $data; push @data, "$name:$data"; } if(@data) { my $line = join " ", @data; print "$line\n"; } } }` [download] I think that rounds it up...	[reply] [d/l] [select]
Re: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by Anonymous Monk on Sep 02, 2009 at 10:47 UTC
Start by reading perlintro, then write some code. Does that help?	[reply]
Re: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by arun_kom (Monk) on Sep 02, 2009 at 21:25 UTC
Similar to the idea given by bart and dwm042 above but a slightly different implementation using a hash of hashes with multiple values per key. #!/usr/bin/perl -w use strict; my @list_of_files = qw(file1.txt file2.txt); my %data; my $file_num = 0; foreach my $file_name(@list_of_files){ $file_num++; my $file = 'File'.$file_num; my $iter; my $flag = 0; open FH, $file_name or die "Cannot open $file_name. $!"; while(<FH>){ chomp; if($_ =~ /(Iteration.+)/){ $iter = $1; $flag = 1; next; } if(/^$/){ $flag = 0; } if($flag){ push( @{$data{$iter}{$file}}, $_ ); } } close FH; } foreach my $iter ( sort keys %data ) { print "$iter\n"; my $size; my $index = 0; do { for my $file ( sort keys %{ $data{$iter} } ) { print "$file: ", @{$data{$iter}{$file}}[$index], " "; $size = $#{$data{$iter}{$file}}; } $index++; print "\n"; } while($index<=$size); print "\n"; } [download]	[reply] [d/l]
Re: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by bichonfrise74 (Vicar) on Sep 02, 2009 at 22:05 UTC
In the first file, I basically store the data in a hash of array structure where I used 'Iteration...' as my key. When I loop through the second file, I would simply check if the key exists and if it does, then it should print the corresponding data. #!/usr/bin/perl use strict; my $file1 = <<END_FILE2; Iteration 1: 0: Data: 1023 1023 1: Data: 200 1023 2: Data: 300 1023 3: Data: 250 1023 4: Data: 1023 1023 Iteration 2: 0: Data: 1023 1024 1: Data: 1023 60 2: Data: 1023 90 3: Data: 1023 100 4: Data: 1023 1023 END_FILE2 my (%record, $key); for my $i (\$file1) { open( my $file_1, '<', $i ) or die "cannot open file"; while( <$file_1> ) { $key = $1 if ( s/^(Iteration \d\d?):// ); push( @{ $record{$key} }, $1 ) if ( /(\d:\s.*)/ ); } close( $file_1 ); } while (my $line = <DATA>) { if ( $line =~ s/^(Iteration \d\d?):// ) { $key = $1; print "$key:\n"; } my ($index) = $line =~ /^(\d)/; print "File1:$record{$key}->[$index] File2:$line" if ( defined( $index ) ); print "\n" if ( $index == $#{ $record{$key} } ); } __DATA__ Iteration 1: 0: Data: 1023 1023 1: Data: 250 1023 2: Data: 60 1023 3: Data: 99 1023 4: Data: 1023 1023 Iteration 2: 0: Data: 1023 1024 1: Data: 1023 60 2: Data: 1023 90 3: Data: 1023 100 4: Data: 1023 1023 [download]	[reply] [d/l]
Re^2: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by lnin (Initiate) on Sep 03, 2009 at 13:28 UTC
Thanks to all Perl monks (bart,arun,bichonfrise74) for providing the solutions. Really helped in learning the perl concepts	[reply]
Re: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by lnin (Initiate) on Sep 02, 2009 at 12:49 UTC
Hello Monks, I could merge the input files in row wise, i am not able to merge it in column wise, as expected, Please help. Below code i wrote which will merge row wise use strict; use warnings; opendir(DIR, '.') or die "Can't open .$!"; my @inputfiles = sort {$a <=> $b} (grep(/\.txt$/,readdir(DIR))); my $Outputfilename="Output.txt"; my $file; print @inputfiles; open (OUTPUT,">$Outputfilename"); foreach $file (@inputfiles) { open (INPUT,$file); while(<INPUT>) { print OUTPUT $_; } } close(OUTPUT);	[reply]
Re^2: How do I Extract contents from given input files and merge into one text file based on Unique keys present in input files by dwm042 (Priest) on Sep 02, 2009 at 17:18 UTC
lnin, bart above really does have the answer to this question. You can't easily read data and then immediately write data any way you please. To write the data column-wise or anyway-wise, it's best to: read all the data and store it. write the data out in the format you want. Given that you are wanting to write out the data in terms of iterations and line numbers, the data can be stored in an array (as pairs of numbers, for example). The index to the array can be the number of the iteration.	[reply]